Then ran the sample code on Llama2 provided in the README and got the following error:
RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: Failed to deserialize cuda engine (/home/jenkins/agent/workspace/LLM/main/L0_MergeRequest/tensorrt_llm/cpp/tensorrt_llm/runtime/tllmRuntime.cpp:68)
Any clue why is that ? By the way nvcc --version is 12.1.
Note:
The pip install ended up installing a CPU version of torch so I reinstalled it using
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121
Hello! I followed the instructions provided in the README file:
I created a new environment and ran this:
Then ran the sample code on Llama2 provided in the README and got the following error:
Any clue why is that ? By the way nvcc --version is 12.1.
Note:
The pip install ended up installing a CPU version of torch so I reinstalled it using
pip install torch==2.2.2 torchvision==0.17.2 torchaudio==2.2.2 --index-url https://download.pytorch.org/whl/cu121
I'm running on A100 GPUs