How to use AMD ROCM to improve embedding speed?

vespa-engine / vespa

AI + Data, online. https://vespa.ai

https://vespa.ai

Apache License 2.0

5.85k stars 604 forks source link

How to use AMD ROCM to improve embedding speed? #29754

Open 3CE8D2BAC65BDD6AA9 opened 11 months ago

3CE8D2BAC65BDD6AA9 commented 11 months ago

I have been trying the codes at:

https://github.com/vespa-engine/sample-apps/tree/master/multi-vector-indexing https://blog.vespa.ai/build-news-recommendation-app-from-python-with-vespa/

Everything works fine. Thank you! When I run the codes, I can see that the CPU is almost 100% utiilized but the GPU is idle. However, I have a AMD 7900 XTX running with ROCM installed. How can I use the GPU instead of the CPU to calculate the embedding?

Thanks in advance.

jobergum commented 10 months ago

Vespa currently only supports CPU and CUDA GPU so I'm afraid that currently Vespa can't use your AMD ROCm.

Vespa uses onnxruntime for inference so it's possible that we can add more execution providers ROCm