Closed piyushgit011 closed 5 months ago
hey, @philschmid how can we deploy quantized model on ml.g4dn.2xlarge?
can we solve this flash attention error?
You need g5 or newer
hey, @philschmid how can we deploy quantized model on ml.g4dn.2xlarge?
can we solve this flash attention error?