huggingface / tgi-gaudi

Large Language Model Text Generation Inference on Habana Gaudi
http://hf.co/docs/text-generation-inference
Apache License 2.0
28 stars 47 forks source link

Removed functions iterating over tensors from torch compilation process #224

Open jczaja opened 2 months ago

jczaja commented 2 months ago

Problem: Recently from dependencies of tgi-gaudi project some torch compile graph breaks were event out and it made some torch compiled graphs much bigger and more memory consuming which in some models could led to Device out-of-memory.

Solution: Torch compiled graphs that wer causing Device OOM behaviour where related to having loops inside of them that where processing lots of tensors. Those functions with loops were excluded from torch compilation process.