probicheaux commented 3 weeks ago

Description

There was some weird bug (at least on my machine) where multiple subsequent calls to paligemma would degrade performance, I tracked it down to mismatch between the pytorch installed cudnn and the system one. Pytorch says "oh don't even bother having yoru own cudnn", but we need it for onnx stuff. So we uninstall the pytorch installed cudnn.

I think the error came in the flash attention implementation or some improperly intialized tensor or something

Type of change

Please delete options that are not relevant.

[x] Bug fix (non-breaking change which fixes an issue)
[ ] New feature (non-breaking change which adds functionality)
[ ] This change requires a documentation update

How has this change been tested, please provide a testcase or example of how you tested the change?

Locally

Any specific deployment considerations

Depends on system cuda maybe

Docs

[ ] Docs updated? What were the changes:

grzegorz-roboflow commented 3 weeks ago

I tried to build this image from this branch locally docker/dockerfiles/Dockerfile.paligemma -t "roboflow-paligemma" . and the process failed with below error

9.563 ERROR: Could not find a version that satisfies the requirement onnxruntime-gpu<=1.15.1 (from versions: none)
9.563 ERROR: No matching distribution found for onnxruntime-gpu<=1.15.1

probicheaux commented 3 weeks ago

@grzegorz-roboflow by locally, do you mean on an m1/m2 mac?

ARM macs aren't supported by onnxruntime-gpu. You can't build docker/dockerfiles/Dockerfile.onnx.gpu either.

You need to add --platform linux/amd64 to your docker build command

roboflow / inference

Fix weird paligemma generation #459

Description

Type of change

How has this change been tested, please provide a testcase or example of how you tested the change?

Any specific deployment considerations

Docs