noahc1510 / trt-llm-rag-linux

A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Linux using TensorRT-LLM
Other
19 stars 5 forks source link

Not able to generate engine on RTX4060 Laptop 8GB 100% CPU being utilised #5

Open Vishwa0703 opened 7 months ago

Vishwa0703 commented 7 months ago

I have RTX4060 8GB in my laptop with 16gb ram and intel i7-12700H cpu when i run build-llama.sh or build-mistral.sh it gets killed automatically with below output and I found that my cpu gets 100% utilised when running build-llama.sh or build-mistral.sh attaching the ss of the same kindly Help me with that

(trtllm) vishwajeet@vishwa:~/Desktop/MYGPT/trt-llm-rag-linux$ bash build-mistral.sh You are using a model of type mistral to instantiate a model of type llama. This is not supported for all configurations of models and can yield errors. [03/22/2024-20:50:37] [TRT-LLM] [I] Serially build TensorRT engines. [03/22/2024-20:50:39] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2991, GPU +0, now: CPU 4121, GPU 1039 (MiB) [03/22/2024-20:50:41] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1798, GPU +314, now: CPU 6055, GPU 1353 (MiB) [03/22/2024-20:50:41] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading [03/22/2024-20:50:41] [TRT-LLM] [W] Invalid timing cache, using freshly created one [03/22/2024-20:50:41] [TRT-LLM] [I] [MemUsage] Rank 0 Engine build starts - Allocated Memory: Host 7.1113 (GiB) Device 1.3216 (GiB) build-mistral.sh: line 1: 6084 Killed python build.py --model_dir './model/mistral/mistral7b_hf' --quant_ckpt_path './model/mistral/mistral7b_int4_quant_weights/mistral_tp1_rank0.npz' --dtype float16 --remove_input_padding --use_gpt_attention_plugin float16 --enable_context_fmha --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --output_dir './model/mistral/mistral7b_int4_engine' --world_size 1 --tp_size 1 --parallel_build --max_input_len 1024 --max_batch_size 1 --max_output_len 1024

Screenshot from 2024-03-22 20-50-43

Vishwa0703 commented 7 months ago

Please explain how to generate engine