noahc1510 / trt-llm-rag-linux

A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Linux using TensorRT-LLM
Other
19 stars 5 forks source link

engine build failed #4

Closed Vishwa0703 closed 7 months ago

Vishwa0703 commented 7 months ago

(trtllm) vishwajeet@vishwa:~/Desktop/MYGPT/trt-llm-rag-linux$ bash build-llama.sh [TensorRT-LLM] TensorRT-LLM version: 0.8.0[03/22/2024-16:36:51] [TRT-LLM] [I] Serially build TensorRT engines. [03/22/2024-16:36:54] [TRT] [I] [MemUsageChange] Init CUDA: CPU +4032, GPU +0, now: CPU 5647, GPU 1383 (MiB) [03/22/2024-16:36:55] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1798, GPU +316, now: CPU 7581, GPU 1699 (MiB) [03/22/2024-16:36:55] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading [03/22/2024-16:36:55] [TRT-LLM] [W] Invalid timing cache, using freshly created one [03/22/2024-16:36:55] [TRT-LLM] [I] [MemUsage] Rank 0 Engine build starts - Allocated Memory: Host 8.5084 (GiB) Device 1.6595 (GiB) Traceback (most recent call last): File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 908, in build(0, args) File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 852, in build engine = build_rank_engine(builder, builder_config, engine_name, File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 613, in build_rank_engine tensorrt_llm_llama = tensorrt_llm.models.LLaMAForCausalLM( File "/home/vishwajeet/miniconda3/envs/trtllm/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 284, in call obj = type.call(cls, *args, **kwargs) TypeError: LLaMAForCausalLM.init() got an unexpected keyword argument 'num_layers'