[SW-204998] Memory optimization for gpt_bitcode (#4)

huggingface / optimum-habana

Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)

Apache License 2.0

153 stars 202 forks source link

[SW-204998] Memory optimization for gpt_bitcode (#4) #1513

Open astachowiczhabana opened 13 hours ago

astachowiczhabana commented 13 hours ago

Use torch.matmul instead of torch.baddbmm in GPTBigCodeAttention._attn for devices other than cpu. This allows for using significantly larger batch sizes in text generation with bigcode-related models.

astachowiczhabana commented 13 hours ago

Hi @libinta this commit is also required with next OH release