huggingface / optimum-nvidia

Apache License 2.0
844 stars 83 forks source link

Can't Run README Code #127

Open hammoudhasan opened 2 months ago

hammoudhasan commented 2 months ago

Hello! I followed the instructions provided in the README file:

I created a new environment and ran this:

apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev
python -m pip install --pre --extra-index-url https://pypi.nvidia.com optimum-nvidia

Then ran the sample code on Llama2 provided in the README and got the following error:

RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: Failed to deserialize cuda engine (/home/jenkins/agent/workspace/LLM/main/L0_MergeRequest/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:68)

Any clue why is that ? By the way nvcc --version is 12.1.

Note: