Open marcomarinodev opened 1 month ago
@marcomarinodev You'll need to mount your accelerator with --gpus all
but first make sure nvidia container toolkit is installed and configured
Correction : nvidia docker toolkit if you are using nvidia GPUs, but with AWS neuron, maybe look into this link : https://awsdocs-neuron.readthedocs-hosted.com/en/latest/containers/index.html
I tried to add --gpus all
, but I get the following error:
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
It looks like the infinity docker image must be compliant with AWS Deep Learning Containers though: even adding --device=/dev/neuron0
didn't work because I see this piece inside infinity logs:
sentence_transformers.SentenceTransformer
INFO: Use pytorch device_name: cpu
also if I try to use neuron-ls inside the container I get that it is not found. Therefore I was wondering if you have the code for executing the benchmarks on AWS inferentia.
I don't know much about AWS machines but on my nvidia T4 Azure machine I had to make sure - nvidia driver for ubuntu, nvidia cuda toolkit, nvidia dnn and nvidia container toolkit was installed - if that helps
I think @michaelfeil can help here
@marcomarinodev Many warnings, I have not used neuron in the last 4 months.
Playbook:
Alibaba-NLP/gte-Qwen2-1.5B-instruct
works. Try https://huggingface.co/BAAI/bge-small-en-v1.5 first.--engine neuron
@jimburtoft from AWS provided some intital guideance for me to better integrate in inferentia.
@marcomarinodev You should use the Hugging Face AMI from the marketplace because it has all the drivers and libraries installed. The 10/8/24 version includes Neuron SDK 2.20. There is no charge for the image, just the instance.
In order to run a model on Inferentia, it needs to be compiled. Hugging Face does this inline for some models, but not these. I pre-compiled https://huggingface.co/aws-neuron/all-MiniLM-L6-v2-neuron for SDK 2.20, so you should be able to deploy it directly from HF.
If that works, other models can be compiled using the instructions in the model card. If the compilation process fails, support may need to be added to some of the Neuron libraries.
If you really want to make a docker file, you would need to install the Neuron libraries AND make sure the host image has the drivers installed. See https://github.com/huggingface/optimum-neuron/blob/018296c824ebae87cb00cc23f75b4493a5d9114e/text-generation-inference/Dockerfile#L92 for an example.
So, in order to have that model available in infinity, should I first compile the model so that it becomes compatible with neuron architecture?
For the most part, yes. There are some edge cases if you are using the Hugging Face Optimum Neuron library. But, if you can't compile it with the "optimum-cli export neuron" command, it won't run on Neuron in Infinity.
@marcomarinodev Many warnings, I have not used neuron in the last 4 months.
Playbook:
- Do not use docker
- I am not sure if
Alibaba-NLP/gte-Qwen2-1.5B-instruct
works. Try https://huggingface.co/BAAI/bge-small-en-v1.5 first.- Use a the huggingface-ami-images from amazon (google for huggingface ami, https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2). The one as of 10/8/24 is torch2.1.1, python3.10
- make sure pytorch is installed with neuron.
- Infinity cli is then available with
--engine neuron
@jimburtoft from AWS provided some intital guideance for me to better integrate in inferentia.
- Is there a way to build a dockerfile.
I tried with your suggestion, but --engine neuron
option is missing. When I try to run infinity_emb v2 --model-id sentence-transformers/all-MiniLM-L6-v2 --engine neuron
I get:
Invalid value for '--engine': 'neuron' is not one of 'torch', 'ctranslate2', 'optimum', 'debugengine'.
Any suggestions?
Hi @michaelfeil any thought regarding --engine neuron
not available?
@marcomarinodev Just added the engine to the cli, main branch only.
# using the AMI with torch installed
git clone https://github.com/michaelfeil/infinity
cd infinity/libs/infinity_emb
# install pip deps without overwriting the existing neuron installation
pip install . --no-deps
pip install uvicorn fastapi orjson typer hf_transfer rich posthog huggingface_hub prometheus-fastapi-instrumentator
Run command
infinity_emb v2 --engine neuron --model-id BAAI/bge-small-en-v1.5
infinity_emb v2 --engine neuron
INFO: Started server process [2287105]
INFO: Waiting for application startup.
INFO 2024-10-18 10:49:20,247 infinity_emb INFO: model=`michaelfeil/bge-small-en-v1.5` selected, using engine=`neuron` and select_model.py:68
device=`None`
ERROR: Traceback (most recent call
@michaelfeil I executed your commands and probably got the same error as yours (inf2.8xlarge with Amazon Linux 2):
[ec2-user@ip-XX-XXX-XXX-XXXinfinity_emb]$ infinity_emb v2 --engine neuron --model-id sentence-transformers/all-MiniLM-L6-v2
INFO: Started server process [3214]
INFO: Waiting for application startup.
INFO 2024-10-21 10:04:54,812 infinity_emb INFO: Creating 1engines: engines=['sentence-transformers/all-MiniLM-L6-v2'] infinity_server.py:88INFO 2024-10-21 10:04:54,815 infinity_emb INFO: Anonymized telemetry can be disabled via environment variable `DO_NOT_TRACK=1`. telemetry.py:30INFO 2024-10-21 10:04:54,820 infinity_emb INFO: model=`sentence-transformers/all-MiniLM-L6-v2` selected, using engine=`neuron` and device=`None` select_model.py:64ERROR: Traceback (most recent call last):
File "/home/ec2-user/.local/lib/python3.12/site-packages/starlette/routing.py", line 693, in lifespan
async with self.lifespan_context(app) as maybe_state:
File "/usr/local/lib/python3.12/contextlib.py", line 210, in __aenter__
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/home/ec2-user/.local/lib/python3.12/site-packages/infinity_emb/infinity_server.py", line 92, in lifespan
app.engine_array = AsyncEngineArray.from_args(engine_args_list) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ec2-user/.local/lib/python3.12/site-packages/infinity_emb/engine.py", line 289, in from_args
return cls(engines=tuple(engines))
^^^^^^^^^^^^^^
File "/home/ec2-user/.local/lib/python3.12/site-packages/infinity_emb/engine.py", line 68, in from_args
engine = cls(**engine_args.to_dict(), _show_deprecation_warning=False)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ec2-user/.local/lib/python3.12/site-packages/infinity_emb/engine.py", line 55, in __init__
self._model, self._min_inference_t, self._max_inference_t = select_model(self._engine_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ec2-user/.local/lib/python3.12/site-packages/infinity_emb/inference/select_model.py", line 72, in select_model
loaded_engine = unloaded_engine.value(engine_args=engine_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ec2-user/.local/lib/python3.12/site-packages/infinity_emb/transformer/embedder/neuron.py", line 81, in __init__
CHECK_OPTIMUM_NEURON.mark_required()
File "/home/ec2-user/.local/lib/python3.12/site-packages/infinity_emb/_optional_imports.py", line 46, in mark_required
self._raise_error()
File "/home/ec2-user/.local/lib/python3.12/site-packages/infinity_emb/_optional_imports.py", line 57, in _raise_error
raise ImportError(msg)
ImportError: optimum.neuron is not available. install via `pip install infinity-emb[neuronx]`
ERROR: Application startup failed. Exiting.
then I checked if infinity-emb was there:
[ec2-user@ip-XX-XXX-XXX-XXXinfinity_emb]$ pip3.12 install infinity-emb[neuronx]
Defaulting to user installation because normal site-packages is not writeable
Looking in indexes: https://pypi.org/simple, https://pip.repos.neuron.amazonaws.com
Requirement already satisfied: infinity-emb[neuronx] in /home/ec2-user/.local/lib/python3.12/site-packages (0.0.66)
WARNING: infinity-emb 0.0.66 does not provide the extra 'neuronx'
Requirement already satisfied: hf_transfer>=0.1.5 in /home/ec2-user/.local/lib/python3.12/site-packages (from infinity-emb[neuronx]) (0.1.8)
Requirement already satisfied: huggingface_hub in /home/ec2-user/.local/lib/python3.12/site-packages (from infinity-emb[neuronx]) (0.26.0)
Requirement already satisfied: numpy<2,>=1.20.0 in /home/ec2-user/.local/lib/python3.12/site-packages (from infinity-emb[neuronx]) (1.26.4)
Requirement already satisfied: filelock in /home/ec2-user/.local/lib/python3.12/site-packages (from huggingface_hub->infinity-emb[neuronx]) (3.16.1)
Requirement already satisfied: fsspec>=2023.5.0 in /home/ec2-user/.local/lib/python3.12/site-packages (from huggingface_hub->infinity-emb[neuronx]) (2024.10.0)
Requirement already satisfied: packaging>=20.9 in /home/ec2-user/.local/lib/python3.12/site-packages (from huggingface_hub->infinity-emb[neuronx]) (24.1)
Requirement already satisfied: pyyaml>=5.1 in /home/ec2-user/.local/lib/python3.12/site-packages (from huggingface_hub->infinity-emb[neuronx]) (6.0.2)
Requirement already satisfied: requests in /home/ec2-user/.local/lib/python3.12/site-packages (from huggingface_hub->infinity-emb[neuronx]) (2.32.3)
Requirement already satisfied: tqdm>=4.42.1 in /home/ec2-user/.local/lib/python3.12/site-packages (from huggingface_hub->infinity-emb[neuronx]) (4.66.5)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /home/ec2-user/.local/lib/python3.12/site-packages (from huggingface_hub->infinity-emb[neuronx]) (4.12.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/ec2-user/.local/lib/python3.12/site-packages (from requests->huggingface_hub->infinity-emb[neuronx]) (3.4.0)
Requirement already satisfied: idna<4,>=2.5 in /home/ec2-user/.local/lib/python3.12/site-packages (from requests->huggingface_hub->infinity-emb[neuronx]) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/ec2-user/.local/lib/python3.12/site-packages (from requests->huggingface_hub->infinity-emb[neuronx]) (2.2.3)
Requirement already satisfied: certifi>=2017.4.17 in /home/ec2-user/.local/lib/python3.12/site-packages (from requests->huggingface_hub->infinity-emb[neuronx]) (2024.8.30)
@marcomarinodev
pip install infinity-emb[neuronx]
was auto-generated, its currently not an option & also installing it via pip would be a complicated setup.
It seems like you did not use the above commands to install, since transformers neuronx is missing on your AMI. Its there by default.
Maybe you created a venv, or overwrote the existing installation transformers-neuronx?
Feature request
Hello, I would like to know if there are any kind of configuration I have to make to run infinity as a docker container inside an inf2 instance on AWS. I tried with the following command, but the models are working on cpu and they're not using the accelerators.
Motivation
The embedding models do not take advantage of the existing neuron accelerators, but they use cpu instead
Your contribution
I can test it on my own ec2 inf2 instances and contribute to any improvements