Can't Run README Code - Githubissues

Hello! I followed the instructions provided in the README file:

I created a new environment and ran this:

apt-get update && apt-get -y install python3.10 python3-pip openmpi-bin libopenmpi-dev
python -m pip install --pre --extra-index-url https://pypi.nvidia.com optimum-nvidia

Then ran the sample code on Llama2 provided in the README and got the following error:

RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: Failed to deserialize cuda engine (/home/jenkins/agent/workspace/LLM/main/L0_MergeRequest/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:68)

Any clue why is that ? By the way nvcc --version is 12.1.

Note:

The pip install ended up installing a CPU version of torch so I reinstalled it using pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121
I'm running on A100 GPUs

huggingface / optimum-nvidia

Can't Run README Code #127