philschmid / llm-sagemaker-sample

Apache License 2.0
49 stars 22 forks source link

VRAM Requirements #9

Open collinhundley opened 10 months ago

collinhundley commented 10 months ago

Hi, thanks for publishing this example.

With Mixtral + TGI, is it actually required to fit the full model in VRAM? Or, is it possible to opt for 100GB+ of system memory with lower GPU capacity?

ml.g5.48xlarge instances are quite expensive, so I’m looking for options to reduce deployment costs.

philschmid commented 10 months ago

Not yet, but we are working on quantization support to be able to use g5.12xlarge.