huggingface / optimum-neuron

Easy, fast and very cheap training and inference on AWS Trainium and Inferentia chips.
Apache License 2.0
177 stars 53 forks source link

optimum-cli export does not take 'neuron' as a valid choice #142

Open adekunleoajayi opened 11 months ago

adekunleoajayi commented 11 months ago

I am following this tutorial on how to deploy LLM to AWS inferentia2.

When I try to convert the model to AWS neuron using the optimum-cli

optimum-cli export neuron --model yiyanghkust/finbert-tone --sequence_length 128 --batch_size 1 tmp/

I get the following error

usage: optimum-cli export [-h] {onnx,tflite} ...
optimum-cli export: error: invalid choice: 'neuron' (choose from 'onnx', 'tflite')
philschmid commented 11 months ago

@adekunleoajayi can you please share your installed version and environment information?

adekunleoajayi commented 11 months ago

@philschmid, Here are the environment information and versions

OS: Linux

Python: 3.8

optimum-neuron:


1. pip install "git+https://github.com/huggingface/optimum-neuron.git@b94d534cc0160f1e199fae6ae3a1c7b804b49e30"  --upgrade (version: 0.0.7.dev0)

2. pip install git+https://github.com/huggingface/optimum-neuron.git (version: 0.0.8.dev0)

3. pip install optimum-neuron (version: 0.0.7)

4. pip install optimum[neuron] (version: 0.0.1)

The first three gave the same error

usage: optimum-cli export [-h] {onnx,tflite} ...
optimum-cli export: error: invalid choice: 'neuron' (choose from 'onnx', 'tflite')

while the last one returned this WARNING: optimum-neuron 0.0.1 does not provide the extra 'neuron' during installation and does not even contain the optimum-cli.

evellasques commented 11 months ago

Hi @adekunleoajayi, I was running into the same issue. I think this is something with the development version of Optimum Neuron. I tried using the HF AMI as described here https://www.philschmid.de/setup-aws-trainium and then the optimum-cli works as expected. Again, when I install the development version inside an instance created on that AMI optimum-cli simply stops working.

Here is a dump of pip freeze on that AMI (hope it helps):

absl-py==1.4.0
accelerate==0.20.3
aiohttp==3.8.4
aiosignal==1.3.1
anyio==3.7.0
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
asttokens==2.2.1
async-timeout==4.0.2
attrs==21.2.0
Automat==20.2.0
aws-neuronx-runtime-discovery==2.9
Babel==2.8.0
backcall==0.2.0
bcrypt==3.2.0
beautifulsoup4==4.12.2
bleach==6.0.0
blinker==1.4
boto3==1.27.0
botocore==1.30.0
cachetools==5.3.1
certifi==2020.6.20
cffi==1.15.1
chardet==4.0.0
charset-normalizer==3.1.0
click==8.0.3
cloud-init==23.1.2
cloud-tpu-client==0.10
colorama==0.4.4
coloredlogs==15.0.1
comm==0.1.3
command-not-found==0.3
configobj==5.0.6
constantly==15.1.0
cryptography==3.4.8
datasets==2.13.0
dbus-python==1.2.18
debugpy==1.6.7
decorator==5.1.1
defusedxml==0.7.1
dill==0.3.6
distlib==0.3.4
distro==1.7.0
distro-info===1.1build1
docutils==0.20.1
ec2-hibinit-agent==1.0.0
ec2-metadata==2.10.0
evaluate==0.4.0
exceptiongroup==1.1.2
executing==1.2.0
fastjsonschema==2.17.1
filelock==3.6.0
frozenlist==1.3.3
fsspec==2023.6.0
google-api-core==1.34.0
google-api-python-client==1.8.0
google-auth==2.21.0
google-auth-httplib2==0.1.0
google-auth-oauthlib==1.0.0
googleapis-common-protos==1.59.1
grpcio==1.56.0
hibagent==1.0.1
httplib2==0.20.2
huggingface-hub==0.16.2
humanfriendly==10.0
hyperlink==21.0.0
idna==3.3
importlib-metadata==4.6.4
incremental==21.3.0
ipykernel==6.24.0
ipython==8.14.0
ipython-genutils==0.2.0
islpy==2023.1
jedi==0.18.2
jeepney==0.7.1
Jinja2==3.0.3
jmespath==1.0.1
joblib==1.3.1
jsonpatch==1.32
jsonpointer==2.0
jsonschema==3.2.0
jupyter-events==0.6.3
jupyter_client==8.3.0
jupyter_core==5.3.1
jupyter_server==2.7.0
jupyter_server_terminals==0.4.4
jupyterlab-pygments==0.2.2
keyring==23.5.0
launchpadlib==1.10.16
lazr.restfulclient==0.14.4
lazr.uri==1.0.6
libneuronxla==0.5.326
lockfile==0.12.2
Markdown==3.4.3
MarkupSafe==2.1.1
matplotlib-inline==0.1.6
mistune==3.0.1
more-itertools==8.10.0
mpmath==1.3.0
multidict==6.0.4
multiprocess==0.70.14
nbclassic==1.0.0
nbclient==0.8.0
nbconvert==7.6.0
nbformat==5.9.0
nest-asyncio==1.5.6
netifaces==0.11.0
networkx==2.6.3
neuronx-cc==2.7.0.40+f7c6cf2a3
neuronx-hwm==2.7.0.3+0092b9d34
notebook==6.5.4
notebook_shim==0.2.3
numpy==1.21.6
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
oauth2client==4.1.3
oauthlib==3.2.0
optimum==1.9.0
optimum-neuron==0.0.7
overrides==7.3.1
packaging==23.1
pandas==2.0.3
pandocfilters==1.5.0
parso==0.8.3
pexpect==4.8.0
pgzip==0.3.4
pickleshare==0.7.5
Pillow==10.0.0
platformdirs==2.5.1
prometheus-client==0.17.0
prompt-toolkit==3.0.39
protobuf==3.20.2
psutil==5.9.5
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==12.0.1
pyasn1==0.4.8
pyasn1-modules==0.2.1
pycparser==2.21
Pygments==2.15.1
PyGObject==3.42.1
PyHamcrest==2.0.2
PyJWT==2.3.0
pyOpenSSL==21.0.0
pyparsing==2.4.7
pyrsistent==0.18.1
pyserial==3.5
python-apt==2.4.0+ubuntu1
python-daemon==3.0.1
python-dateutil==2.8.2
python-debian===0.1.43ubuntu1
python-json-logger==2.0.7
python-magic==0.4.24
pytz==2022.1
PyYAML==5.4.1
pyzmq==25.1.0
regex==2023.6.3
requests==2.28.2
requests-oauthlib==1.3.1
requests-unixsocket==0.3.0
responses==0.18.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rsa==4.9
s3transfer==0.6.1
safetensors==0.3.1
scikit-learn==1.3.0
scipy==1.7.3
SecretStorage==3.3.1
Send2Trash==1.8.2
sentencepiece==0.1.99
service-identity==18.1.0
six==1.16.0
sniffio==1.3.0
sos==4.4
soupsieve==2.4.1
ssh-import-id==5.11
stack-data==0.6.2
sympy==1.12
systemd-python==234
tensorboard==2.13.0
tensorboard-data-server==0.7.1
tensorboard-plugin-neuronx==2.5.37.0
terminado==0.17.1
threadpoolctl==3.1.0
tinycss2==1.2.1
tokenizers==0.13.3
torch==1.13.1
torch-neuronx==1.13.1.1.8.0
torch-xla==1.13.1+torchneuron7
torchvision==0.14.1
tornado==6.3.2
tqdm==4.65.0
traitlets==5.9.0
transformers==4.30.2
Twisted==22.1.0
typing_extensions==4.7.1
tzdata==2023.3
ubuntu-advantage-tools==8001
ufw==0.36.1
unattended-upgrades==0.1
uritemplate==3.0.1
urllib3==1.26.5
virtualenv==20.13.0+ds
wadllib==1.3.6
wcwidth==0.2.6
webencodings==0.5.1
websocket-client==1.6.1
Werkzeug==2.3.6
xxhash==3.2.0
yarl==1.9.2
zipp==1.0.0
zope.interface==5.4.0
vishvananda commented 10 months ago

I'm running in to the same error message if I install optimum[neuronx] on a regular linux machine. It seems to work if I install it on an inferentia2 aws instance. It seems that the hardware is required to do the conversion. Is it possible to convert the model without the chip somehow?

JingyaHuang commented 10 months ago

Hi @adekunleoajayi, I am not able to reproduce the issue that you met with optimum-neuron==0.0.7:

(aws_neuron_venv_2.8) ubuntu@ip-xxx-xx-xx-xx:~$ optimum-cli export neuron --model yiyanghkust/finbert-tone --sequence_length 128 --batch_size 1 test_issue/
Validating Neuron model...
        - Validating Neuron Model output "logits":
                -[✓] (1, 3) matches (1, 3)
                -[✓] all values close (atol: 0.0001)
The Neuronx export succeeded and the exported model was saved at: .

From the error log, it seems that the neuron backend was not registered. Do you have all neuronx dependencies installed? Could you send me the output of following commands so that I can see your setup?

pip3 list | grep -e neuron -e xla -e torch
apt list --installed | grep aws-neuron

Here below are the versions that I tested to exporter your checkpoint:

aws-neuronx-runtime-discovery 2.9
libneuronxla                  0.5.391
neuronx-cc                    2.8.0.25+a3ad0f342
neuronx-distributed           0.2.0
neuronx-hwm                   2.8.0.3+2b7c6da39
optimum-neuron                0.0.7
torch                         1.13.1
torch-neuronx                 1.13.1.1.9.0
torch-xla                     1.13.1+torchneuron8
torchvision                   0.14.1
transformers-neuronx          0.5.58

Driver:

aws-neuronx-collectives/unknown,now 2.15.13.0-db4e2d9a9 amd64 [installed,upgradable to: 2.15.16.0-db4e2d9a9]
aws-neuronx-dkms/unknown,now 2.11.9.0 amd64 [installed]
aws-neuronx-runtime-lib/unknown,now 2.15.11.0-f168cb23b amd64 [installed,upgradable to: 2.15.14.0-279f319f2]
aws-neuronx-tools/unknown,now 2.12.2.0 amd64 [installed]
JingyaHuang commented 10 months ago

Hi @evellasques, the dev branch of optimum-neuron was not stable enough. Last week I improve our inference CI #168 , it should be better in future. So if you want to test with dev branch for latest features, the best practice would be to check if the current CIs are green for the features that you plan to use.

JingyaHuang commented 10 months ago

@vishvananda , sure! You can compile your model on CPU only instance and then run on INF2 / INF1. Just you need to ensure that you:

@adekunleoajayi and @evellasques I took the yiyanghkust/finbert-tone checkpoint as an example:

optimum-cli export neuron --model yiyanghkust/finbert-tone --disable-validation --sequence_length 128 --batch_size 1 test_issue/

Remove --disable-validation for validation if you have neuron devices on your instance.

Configure compilation args for better latency -> I would recommend start with auto_cast matmul with auto_cast_type bf16

The compiled artifacts can be found here: Jingya/finbert-tone

from transformers import AutoTokenizer
from optimum.neuron import NeuronModelForSequenceClassification

model = NeuronModelForSequenceClassification.from_pretrained("Jingya/finbert-tone")
tokenizer = AutoTokenizer.from_pretrained("yiyanghkust/finbert-tone")

inputs = tokenizer("there is a shortage of capital, and we need extra financing", return_tensors="pt")

logits = model(**inputs).logits
print(model.config.id2label[logits.argmax().item()])
# 'Negative'
JingyaHuang commented 10 months ago

Hey folks, any other questions on this particular issue. I will close this as it shall be completed with the recent 0.0.10 release. Feel free to reopen the issue if there is any further question.

AhmedAl93 commented 3 months ago

@JingyaHuang Thank you for your explanations ! I tried many optimum-neuron versions, no one worked for me, every time I try to convert the model to AWS neuron, I get this error: usage: optimum-cli export [-h] {onnx,tflite} ... optimum-cli export: error: invalid choice: 'neuron' (choose from 'onnx', 'tflite') Here are some info about my environment: Instance: inf2.8xlarge OS: Linux Distribution: Ubuntu 20.04 AMI: Deep Learning AMI GPU PyTorch 1.13.1 (Ubuntu 20.04) 20231103 Also, you can check the following screenshots: neuron_LLM_export neuron_LLM_export_1 neuron_LLM_export_2 neuron_LLM_export_inf2 Is the HF AMI necessary for model conversion ?

JingyaHuang commented 3 months ago

Hi @AhmedAl93, Hugging Face Neuron Deep Learning AMI (Ubuntu 22.04) is the recommended way to use optimum-neuron via ec2 instances, since every dependency is configured and tested. And we recommend to use the latest one as well. I saw that you are using a GPU AMI, could you try the Neuron AMI?

If you have any subscription issues, and need to continue with your current instance, I would suggest you install the latest version of AWS Neuron SDK(2.17) and optimum-neuron (from the screenshot, you are using v0.0.7 which is quite old and might not be able to support Mistral that you are testing with).

Btw, @dacorvo submitted a new AMI with optimum-neuron v0.0.20 which came out last week, with better support llm like Mistral, it will come out very soon.

AhmedAl93 commented 3 months ago

@JingyaHuang Thank you for your response :) I tried a Neuron AMI (Deep Learning AMI Neuron PyTorch 1.13 (Ubuntu 20.04)), inf2.8xlarge instance, but still not working ! As mentioned in my previous comment, I used different optimum-neuron versions (0.0.20, 0.0.19, ...). for optimum-neuron v0.0.20: neuron_LLM_export_neuron_AMI for optimum-neuron v0.0.19: neuron_LLM_export_neuron_AMI_1 For the new AMI, that will be really helpful, I will gladly test it when it's available

brianloyal commented 3 months ago

I'm seeing the same issue with the newest AMI (huggingface-neuron-2024-03-18T07-48-01Z-692efe1a-8d5c-4033-bcbc-5d99f2d4ae6a) Screenshot 2024-03-18 at 12 09 03 PM

> optimum-cli neuron
usage: optimum-cli
Optimum CLI tool: error: invalid choice: 'neuron' (choose from 'export', 'env', 'onnxruntime')
JingyaHuang commented 3 months ago

Hi @brianloyal and @AhmedAl93, thanks for trying and reporting. I will try to reproduce and let you know.

dacorvo commented 3 months ago

@brianloyal I reproduced your issue with our latest AMI. cc @philschmid @shub-kris

dacorvo commented 3 months ago

Updating optimum then reinstalling optimum-neuron solved the issue:

$ python -m pip install -U optimum
$ python -m pip install optimum-neuron==0.0.20
JingyaHuang commented 2 months ago

Hi @AhmedAl93 and @brianloyal, sorry for the late reply. We just released a new DLAMI(20240409) which should solve the issue, our AMI creation pipeline did not correctly install the package leading to the issue, and now the AMI creation is fixed and we will ensure the functionalities of our DLAMI in future releases. Thanks for your patience, please feel free to try out and ping me if there is any further issue. THX!

HuggingFaceDocBuilderDev commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

HuggingFaceDocBuilderDev commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!

HuggingFaceDocBuilderDev commented 2 weeks ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Thank you!