roboflow / inference

A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
https://inference.roboflow.com
Other
1.12k stars 84 forks source link

Fix weird paligemma generation #459

Closed probicheaux closed 3 weeks ago

probicheaux commented 3 weeks ago

Description

There was some weird bug (at least on my machine) where multiple subsequent calls to paligemma would degrade performance, I tracked it down to mismatch between the pytorch installed cudnn and the system one. Pytorch says "oh don't even bother having yoru own cudnn", but we need it for onnx stuff. So we uninstall the pytorch installed cudnn.

I think the error came in the flash attention implementation or some improperly intialized tensor or something

Type of change

Please delete options that are not relevant.

How has this change been tested, please provide a testcase or example of how you tested the change?

Locally

Any specific deployment considerations

Depends on system cuda maybe

Docs

grzegorz-roboflow commented 3 weeks ago

I tried to build this image from this branch locally docker/dockerfiles/Dockerfile.paligemma -t "roboflow-paligemma" . and the process failed with below error

9.563 ERROR: Could not find a version that satisfies the requirement onnxruntime-gpu<=1.15.1 (from versions: none)
9.563 ERROR: No matching distribution found for onnxruntime-gpu<=1.15.1
probicheaux commented 3 weeks ago

@grzegorz-roboflow by locally, do you mean on an m1/m2 mac?

Screenshot 2024-06-10 at 8 20 52 AM

ARM macs aren't supported by onnxruntime-gpu. You can't build docker/dockerfiles/Dockerfile.onnx.gpu either.

You need to add --platform linux/amd64 to your docker build command