philschmid / llm-sagemaker-sample

Apache License 2.0
37 stars 17 forks source link

VRAM Requirements #9

Open collinhundley opened 6 months ago

collinhundley commented 6 months ago

Hi, thanks for publishing this example.

With Mixtral + TGI, is it actually required to fit the full model in VRAM? Or, is it possible to opt for 100GB+ of system memory with lower GPU capacity?

ml.g5.48xlarge instances are quite expensive, so I’m looking for options to reduce deployment costs.

philschmid commented 6 months ago

Not yet, but we are working on quantization support to be able to use g5.12xlarge.