deployment on t4 instance

philschmid / llm-sagemaker-sample

Apache License 2.0

37 stars 17 forks source link

Closed piyushgit011 closed 5 months ago

piyushgit011 commented 6 months ago

hey, @philschmid how can we deploy quantized model on ml.g4dn.2xlarge?

can we solve this flash attention error?

philschmid commented 5 months ago

You need g5 or newer