michaelfeil / infinity

Infinity is a high-throughput, low-latency REST API for serving vector embeddings, supporting a wide range of text-embedding models and frameworks.
https://michaelfeil.eu/infinity/
MIT License
951 stars 71 forks source link

AMD ROCm docker images support (+ optimization) #94

Open michaelfeil opened 4 months ago

michaelfeil commented 4 months ago

I am planning to evaluate hardware agnostic options

hiepxanh commented 4 months ago

yes I love this, I have an amd device and happy to try with it

michaelfeil commented 4 months ago

Awesome, love the proactivity!!! let me create a draft PR. Do you have ROCm installed and can use pytorch with ROCM? @hiepxanh

michaelfeil commented 4 months ago

Here would be some instructions on how to get in installed. https://rocm.docs.amd.com/projects/install-on-linux/en/develop/how-to/3rd-party/pytorch-install.html

In my opinion, it should run out of the box with rocm, the question would be to build and run a docker + the performance.

hiepxanh commented 3 months ago

I have ROCm and AMD rx 6600 card, but I get a lot of issue while testing, pytorch not support windows, the unbutu image eat 20gbs for just only ROCm, I think we not ready now. I see vulkan work great with llama.cpp, maybe that is a good option to run the model. Let's keep this issue open while I'm watching AMD team.

michaelfeil commented 3 months ago

Sadly, Ubuntu/Linux (no WSL) and a error free installation of rocm is a strict requirement for ROCm.

hiepxanh commented 3 months ago

Yes, it correct, the WSL and docker is great place to install, I have failed last time. I will do futher test once I have freetime

peebles commented 3 weeks ago

I am running Ubuntu 22.10, with a Navi 23 [Radeon RX 6650 XT] and ROCm drivers installed (rocminfo and clinfo). I'll give infinity a shot on this system. When I run --help, I do not see rocm listed as a device, and when I run infinity with no special options, it picks "cpu" as a device. How should I run?

hvico commented 2 weeks ago

I can report Infinity works perfectly well using ROCm accelerated PytTorch on a 7900XTX.

Just one tip, if you're not using the MI250X and MI300X series like me, set this variable before starting Infinity, to avoid PyTorch errors complaining about no HIPBLAS support:

TORCH_BLAS_PREFER_HIPBLASLT=0

Ref: https://github.com/comfyanonymous/ComfyUI/issues/3698

peebles commented 2 weeks ago

Did you build infinity-emb from scratch using a different pytorch that the one in pyproject.toml? Personally I am using the pre-built docker container image michaelf34/infinity:latest. @hvico if you have a custom build, I'd like to know your recipe!

hvico commented 2 weeks ago

Did you build infinity-emb from scratch using a different pytorch that the one in pyproject.toml? Personally I am using the pre-built docker container image michaelf34/infinity:latest. @hvico if you have a custom build, I'd like to know your recipe!

Hi. I didn't, I just installed the latest pip wheel, and then installed the official nightly Python ROCm packages (replacing the torch distribution by that one).

To dockerize this I froze that virtualenv, started from the officlal ROCm Pytorch docker image, and added the proper pip install -r requirements.txt from the exported file.

I am not sharing that file because it has many other dependencies unrelated with infinity, so it makes no sense to use that as a template. But this is the main procedure I followed.

Hope it helps.