philschmid / llm-sagemaker-sample

Apache License 2.0
37 stars 17 forks source link

deployment on t4 instance #11

Closed piyushgit011 closed 5 months ago

piyushgit011 commented 6 months ago

hey, @philschmid how can we deploy quantized model on ml.g4dn.2xlarge?

image

image can we solve this flash attention error?

philschmid commented 5 months ago

You need g5 or newer