noahc1510 / trt-llm-rag-linux

A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Linux using TensorRT-LLM
Other
19 stars 5 forks source link

not able to build directory using build.py #3

Open nihalkumar2k21 opened 7 months ago

nihalkumar2k21 commented 7 months ago

(mlr_chat) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ ./build-mistral.sh You are using a model of type mistral to instantiate a model of type llama. This is not supported for all configurations of models and can yield errors. [TensorRT-LLM] TensorRT-LLM version: 0.8.0Traceback (most recent call last): File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 895, in args = parse_arguments() File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 549, in parse_arguments lora_config = LoraConfig.from_hf(args.hf_lora_dir, TypeError: LoraConfig.from_hf() missing 1 required positional argument: 'trtllm_modules_to_hf_modules' (mlr_chat) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ ./build-llama.sh [TensorRT-LLM] TensorRT-LLM version: 0.8.0Traceback (most recent call last): File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 895, in args = parse_arguments() File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 549, in parse_arguments lora_config = LoraConfig.from_hf(args.hf_lora_dir, TypeError: LoraConfig.from_hf() missing 1 required positional argument: 'trtllm_modules_to_hf_modules'

c6du commented 7 months ago

I think you can try setting it to an empty dictionary like: lora_config = LoraConfig.from_hf(args.hf_lora_dir, hf_modules_to_trtllm_modules, dict())

if you check LoraConfig class you can notice from_hf actually called init function and this argument default value is a empty dictionary.

sugar5727 commented 7 months ago

You need use the tensorrt-llm==0.7.1

Vishwa0703 commented 7 months ago

I think you can try setting it to an empty dictionary like: lora_config = LoraConfig.from_hf(args.hf_lora_dir, hf_modules_to_trtllm_modules, dict())

if you check LoraConfig class you can notice from_hf actually called init function and this argument default value is a empty dictionary.

After setting an empty dict and running build.sh getting

(trtllm) vishwajeet@vishwa:~/Desktop/MYGPT/trt-llm-rag-linux$ bash build-llama.sh [TensorRT-LLM] TensorRT-LLM version: 0.8.0[03/22/2024-19:03:13] [TRT-LLM] [I] Serially build TensorRT engines. [03/22/2024-19:03:15] [TRT] [I] [MemUsageChange] Init CUDA: CPU +4032, GPU +0, now: CPU 5647, GPU 1383 (MiB) [03/22/2024-19:03:16] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1798, GPU +316, now: CPU 7581, GPU 1699 (MiB) [03/22/2024-19:03:16] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading [03/22/2024-19:03:16] [TRT-LLM] [W] Invalid timing cache, using freshly created one [03/22/2024-19:03:17] [TRT-LLM] [I] [MemUsage] Rank 0 Engine build starts - Allocated Memory: Host 8.5100 (GiB) Device 1.6595 (GiB) Traceback (most recent call last): File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 908, in build(0, args) File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 852, in build engine = build_rank_engine(builder, builder_config, engine_name, File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 613, in build_rank_engine tensorrt_llm_llama = tensorrt_llm.models.LLaMAForCausalLM( File "/home/vishwajeet/miniconda3/envs/trtllm/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 284, in call obj = type.call(cls, *args, **kwargs) TypeError: LLaMAForCausalLM.init() got an unexpected keyword argument 'num_layers'

sugar5727 commented 7 months ago

I think you can try setting it to an empty dictionary like: lora_config = LoraConfig.from_hf(args.hf_lora_dir, hf_modules_to_trtllm_modules, dict()) if you check LoraConfig class you can notice from_hf actually called init function and this argument default value is a empty dictionary.

After setting an empty dict and running build.sh getting

(trtllm) vishwajeet@vishwa:~/Desktop/MYGPT/trt-llm-rag-linux$ bash build-llama.sh [TensorRT-LLM] TensorRT-LLM version: 0.8.0[03/22/2024-19:03:13] [TRT-LLM] [I] Serially build TensorRT engines. [03/22/2024-19:03:15] [TRT] [I] [MemUsageChange] Init CUDA: CPU +4032, GPU +0, now: CPU 5647, GPU 1383 (MiB) [03/22/2024-19:03:16] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1798, GPU +316, now: CPU 7581, GPU 1699 (MiB) [03/22/2024-19:03:16] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading [03/22/2024-19:03:16] [TRT-LLM] [W] Invalid timing cache, using freshly created one [03/22/2024-19:03:17] [TRT-LLM] [I] [MemUsage] Rank 0 Engine build starts - Allocated Memory: Host 8.5100 (GiB) Device 1.6595 (GiB) Traceback (most recent call last): File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 908, in build(0, args) File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 852, in build engine = build_rank_engine(builder, builder_config, engine_name, File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 613, in build_rank_engine tensorrt_llm_llama = tensorrt_llm.models.LLaMAForCausalLM( File "/home/vishwajeet/miniconda3/envs/trtllm/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 284, in call obj = type.call(cls, *args, kwargs) TypeError: LLaMAForCausalLM.init**() got an unexpected keyword argument 'num_layers'

same things I met,so you can try to install tensorrt-llm==0.7.1

Vishwa0703 commented 7 months ago

@sugar5727 Downgraded to tensorrt-llm==0.7.1 and now I am not facing those issues and I have RTX4060 Laptop 8gb when I run build-llama.sh it starts but gets killed

(trtllm) vishwajeet@vishwa:~/Desktop/MYGPT/trt-llm-rag-linux$ bash build-llama.sh [03/22/2024-19:41:34] [TRT-LLM] [I] Serially build TensorRT engines. [03/22/2024-19:41:36] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2991, GPU +0, now: CPU 4121, GPU 1039 (MiB) [03/22/2024-19:41:37] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1798, GPU +314, now: CPU 6055, GPU 1353 (MiB) [03/22/2024-19:41:37] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading [03/22/2024-19:41:37] [TRT-LLM] [W] Invalid timing cache, using freshly created one [03/22/2024-19:41:38] [TRT-LLM] [I] [MemUsage] Rank 0 Engine build starts - Allocated Memory: Host 7.1123 (GiB) Device 1.3216 (GiB) build-llama.sh: line 1: 41317 Killed python build.py --model_dir './model/llama/llama13_hf' --quant_ckpt_path './model/llama/llama13_int4_awq_weights/llama_tp1_rank0.npz' --dtype float16 --remove_input_padding --use_gpt_attention_plugin float16 --enable_context_fmha --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --output_dir './model/llama/llama13_int4_engine' --world_size 1 --tp_size 1 --parallel_build --max_input_len 3900 --max_batch_size 1 --max_output_len 1024

sugar5727 commented 7 months ago

@sugar5727 Downgraded to tensorrt-llm==0.7.1 and now I am not facing those issues and I have RTX4060 Laptop 8gb when I run build-llama.sh it starts but gets killed

(trtllm) vishwajeet@vishwa:~/Desktop/MYGPT/trt-llm-rag-linux$ bash build-llama.sh [03/22/2024-19:41:34] [TRT-LLM] [I] Serially build TensorRT engines. [03/22/2024-19:41:36] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2991, GPU +0, now: CPU 4121, GPU 1039 (MiB) [03/22/2024-19:41:37] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1798, GPU +314, now: CPU 6055, GPU 1353 (MiB) [03/22/2024-19:41:37] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading [03/22/2024-19:41:37] [TRT-LLM] [W] Invalid timing cache, using freshly created one [03/22/2024-19:41:38] [TRT-LLM] [I] [MemUsage] Rank 0 Engine build starts - Allocated Memory: Host 7.1123 (GiB) Device 1.3216 (GiB) build-llama.sh: line 1: 41317 Killed python build.py --model_dir './model/llama/llama13_hf' --quant_ckpt_path './model/llama/llama13_int4_awq_weights/llama_tp1_rank0.npz' --dtype float16 --remove_input_padding --use_gpt_attention_plugin float16 --enable_context_fmha --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --output_dir './model/llama/llama13_int4_engine' --world_size 1 --tp_size 1 --parallel_build --max_input_len 3900 --max_batch_size 1 --max_output_len 1024

sry, I dont face it before

Vishwa0703 commented 7 months ago

@sugar5727 which gpu you have?

sugar5727 commented 7 months ago

@sugar5727 which gpu you have?

RTX 4090

nihalkumar2k21 commented 7 months ago

(mlr_chat) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ ./build-mistral.sh You are using a model of type mistral to instantiate a model of type llama. This is not supported for all configurations of models and can yield errors. [TensorRT-LLM] TensorRT-LLM version: 0.8.0Traceback (most recent call last): File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 895, in args = parse_arguments() File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 549, in parse_arguments lora_config = LoraConfig.from_hf(args.hf_lora_dir, TypeError: LoraConfig.from_hf() missing 1 required positional argument: 'trtllm_modules_to_hf_modules' (mlr_chat) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ ./build-llama.sh [TensorRT-LLM] TensorRT-LLM version: 0.8.0Traceback (most recent call last): File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 895, in args = parse_arguments() File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 549, in parse_arguments lora_config = LoraConfig.from_hf(args.hf_lora_dir, TypeError: LoraConfig.from_hf() missing 1 required positional argument: 'trtllm_modules_to_hf_modules'

pip uninstall tensorrt_llm

then re-install

pip3 install tensorrt_llm==0.7.1 -U --pre --extra-index-url https://pypi.nvidia.com --log=debug.txt

nihalkumar2k21 commented 7 months ago

new error:

(trtllm) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ python3 app.py Invalid MIT-MAGIC-COOKIE-1 key[anil-gpu2:45735] Process received signal [anil-gpu2:45735] Signal: Segmentation fault (11) [anil-gpu2:45735] Signal code: Address not mapped (1) [anil-gpu2:45735] Failing at address: 0x440000e9 [anil-gpu2:45735] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f206b1b2420] [anil-gpu2:45735] [ 1] /lib/x86_64-linux-gnu/libmpi.so.40(PMPI_Comm_set_errhandler+0x47)[0x7f1e0f681fc7] [anil-gpu2:45735] [ 2] /home/anil/miniconda3/envs/trtllm/lib/python3.10/site-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0x9abf0)[0x7f1dea220bf0] [anil-gpu2:45735] [ 3] /home/anil/miniconda3/envs/trtllm/lib/python3.10/site-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0x2decf)[0x7f1dea1b3ecf] [anil-gpu2:45735] [ 4] python3(PyModule_ExecDef+0x70)[0x597d40] [anil-gpu2:45735] [ 5] python3[0x5990c9] [anil-gpu2:45735] [ 6] python3[0x4fd37b] [anil-gpu2:45735] [ 7] python3(_PyEval_EvalFrameDefault+0x5a74)[0x4f37a4] [anil-gpu2:45735] [ 8] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f] [anil-gpu2:45735] [ 9] python3(_PyEval_EvalFrameDefault+0x4b26)[0x4f2856] [anil-gpu2:45735] [10] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f] [anil-gpu2:45735] [11] python3(_PyEval_EvalFrameDefault+0x731)[0x4ee461] [anil-gpu2:45735] [12] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f] [anil-gpu2:45735] [13] python3(_PyEval_EvalFrameDefault+0x31f)[0x4ee04f] [anil-gpu2:45735] [14] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f] [anil-gpu2:45735] [15] python3(_PyEval_EvalFrameDefault+0x31f)[0x4ee04f] [anil-gpu2:45735] [16] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f] [anil-gpu2:45735] [17] python3[0x4fd514] [anil-gpu2:45735] [18] python3(_PyObject_CallMethodIdObjArgs+0x137)[0x50c327] [anil-gpu2:45735] [19] python3(PyImport_ImportModuleLevelObject+0x525)[0x50b685] [anil-gpu2:45735] [20] python3[0x517454] [anil-gpu2:45735] [21] python3[0x4fd907] [anil-gpu2:45735] [22] python3(PyObject_Call+0x209)[0x50a259] [anil-gpu2:45735] [23] python3(_PyEval_EvalFrameDefault+0x5a74)[0x4f37a4] [anil-gpu2:45735] [24] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f] [anil-gpu2:45735] [25] python3(_PyEval_EvalFrameDefault+0x31f)[0x4ee04f] [anil-gpu2:45735] [26] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f] [anil-gpu2:45735] [27] python3[0x4fd514] [anil-gpu2:45735] [28] python3(_PyObject_CallMethodIdObjArgs+0x137)[0x50c327] [anil-gpu2:45735] [29] python3(PyImport_ImportModuleLevelObject+0x9da)[0x50bb3a] [anil-gpu2:45735] End of error message Segmentation fault (core dumped)

(trtllm) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ conda list

packages in environment at /home/anil/miniconda3/envs/trtllm:

#

Name Version Build Channel

_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
absl-py 2.1.0 pypi_0 pypi accelerate 0.20.3 pypi_0 pypi aiofiles 23.2.1 pypi_0 pypi aiohttp 3.9.3 pypi_0 pypi aiosignal 1.3.1 pypi_0 pypi alembic 1.13.1 pypi_0 pypi altair 5.2.0 pypi_0 pypi annotated-types 0.6.0 pypi_0 pypi anyio 3.7.1 pypi_0 pypi async-timeout 4.0.3 pypi_0 pypi attrs 23.2.0 pypi_0 pypi beautifulsoup4 4.12.3 pypi_0 pypi blas 1.0 mkl
build 1.1.1 pypi_0 pypi bzip2 1.0.8 h5eee18b_5
ca-certificates 2024.2.2 hbcca054_0 conda-forge certifi 2024.2.2 pyhd8ed1ab_0 conda-forge charset-normalizer 2.0.4 pyhd3eb1b0_0
click 8.1.7 pypi_0 pypi colorama 0.4.6 pypi_0 pypi colored 2.2.4 pypi_0 pypi coloredlogs 15.0.1 pypi_0 pypi contourpy 1.2.0 pypi_0 pypi ctransformers 0.2.26 pypi_0 pypi cuda-cudart 12.1.105 0 nvidia cuda-cupti 12.1.105 0 nvidia cuda-libraries 12.1.0 0 nvidia cuda-nvrtc 12.1.105 0 nvidia cuda-nvtx 12.1.105 0 nvidia cuda-opencl 12.4.99 0 nvidia cuda-python 12.2.0 pypi_0 pypi cuda-runtime 12.1.0 0 nvidia cycler 0.12.1 pypi_0 pypi cython 3.0.9 pypi_0 pypi dataclasses-json 0.6.4 pypi_0 pypi datasets 2.14.6 pypi_0 pypi deprecated 1.2.14 pypi_0 pypi diffusers 0.15.0 pypi_0 pypi dill 0.3.7 pypi_0 pypi distro 1.9.0 pypi_0 pypi docx2txt 0.8 pypi_0 pypi environs 9.5.0 pypi_0 pypi evaluate 0.4.1 pypi_0 pypi exceptiongroup 1.2.0 pypi_0 pypi faiss-cpu 1.7.4 pypi_0 pypi fastapi 0.110.0 pypi_0 pypi ffmpeg 4.3 hf484d3e_0 pytorch ffmpy 0.3.2 pypi_0 pypi filelock 3.13.1 py310h06a4308_0
flask 2.2.3 pypi_0 pypi flask-marshmallow 0.15.0 pypi_0 pypi flask-migrate 4.0.4 pypi_0 pypi flask-sqlalchemy 3.0.3 pypi_0 pypi flatbuffers 24.3.7 pypi_0 pypi fonttools 4.50.0 pypi_0 pypi freetype 2.12.1 h4a9f257_0
frozenlist 1.4.1 pypi_0 pypi fsspec 2023.10.0 pypi_0 pypi gmp 6.2.1 h295c915_3
gmpy2 2.1.2 py310heeb90bb_0
gnutls 3.6.15 he1e5248_0
gradio 4.14.0 pypi_0 pypi gradio-client 0.8.0 pypi_0 pypi greenlet 3.0.3 pypi_0 pypi grpcio 1.56.0 pypi_0 pypi h11 0.14.0 pypi_0 pypi httpcore 1.0.4 pypi_0 pypi httpx 0.27.0 pypi_0 pypi huggingface-hub 0.21.4 pypi_0 pypi humanfriendly 10.0 pypi_0 pypi idna 3.4 py310h06a4308_0
importlib-metadata 7.1.0 pypi_0 pypi importlib-resources 6.4.0 pypi_0 pypi intel-openmp 2023.1.0 hdb19cb5_46306
itsdangerous 2.1.2 pypi_0 pypi janus 1.0.0 pypi_0 pypi jinja2 3.1.3 py310h06a4308_0
joblib 1.3.2 pypi_0 pypi jpeg 9e h5eee18b_1
jsonpatch 1.33 pypi_0 pypi jsonpointer 2.4 pypi_0 pypi jsonschema 4.21.1 pypi_0 pypi jsonschema-specifications 2023.12.1 pypi_0 pypi kiwisolver 1.4.5 pypi_0 pypi lame 3.100 h7b6447c_0
langchain 0.0.310 pypi_0 pypi langsmith 0.0.43 pypi_0 pypi lark 1.1.9 pypi_0 pypi lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.38 h1181459_1
lerc 3.0 h295c915_0
libcublas 12.1.0.26 0 nvidia libcufft 11.0.2.4 0 nvidia libcufile 1.9.0.20 0 nvidia libcurand 10.3.5.119 0 nvidia libcusolver 11.4.4.55 0 nvidia libcusparse 12.0.2.55 0 nvidia libdeflate 1.17 h5eee18b_1
libffi 3.4.4 h6a678d5_0
libgcc-ng 11.2.0 h1234567_1
libgfortran-ng 7.5.0 h14aa051_20 conda-forge libgfortran4 7.5.0 h14aa051_20 conda-forge libgomp 11.2.0 h1234567_1
libiconv 1.16 h7f8727e_2
libidn2 2.3.4 h5eee18b_0
libjpeg-turbo 2.0.0 h9bf148f_0 pytorch libnpp 12.0.2.50 0 nvidia libnvjitlink 12.1.105 0 nvidia libnvjpeg 12.1.1.14 0 nvidia libpng 1.6.39 h5eee18b_0
libstdcxx-ng 11.2.0 h1234567_1
libtasn1 4.19.0 h5eee18b_0
libtiff 4.5.1 h6a678d5_0
libunistring 0.9.10 h27cfd23_0
libuuid 1.41.5 h5eee18b_0
libwebp-base 1.3.2 h5eee18b_0
llama-index 0.9.27 pypi_0 pypi llvm-openmp 14.0.6 h9e868ea_0
lz4-c 1.9.4 h6a678d5_0
mako 1.3.2 pypi_0 pypi markdown-it-py 3.0.0 pypi_0 pypi markupsafe 2.1.3 py310h5eee18b_0
marshmallow 3.21.1 pypi_0 pypi matplotlib 3.8.3 pypi_0 pypi mdurl 0.1.2 pypi_0 pypi mkl 2023.1.0 h213fc3f_46344
mkl-service 2.4.0 py310h5eee18b_1
mkl_fft 1.3.8 py310h5eee18b_0
mkl_random 1.2.4 py310hdb19cb5_0
mpc 1.1.0 h10f8cd9_1
mpfr 4.0.2 hb69a4c5_1
mpi 1.0 mpich conda-forge mpi4py 3.1.4 py310hfc96bbd_0
mpich 3.3.2 hc856adb_0
mpmath 1.3.0 py310h06a4308_0
multidict 6.0.5 pypi_0 pypi multiprocess 0.70.15 pypi_0 pypi mypy-extensions 1.0.0 pypi_0 pypi ncurses 6.4 h6a678d5_0
nest-asyncio 1.6.0 pypi_0 pypi nettle 3.7.3 hbbd107a_1
networkx 3.1 py310h06a4308_0
ninja 1.11.1.1 pypi_0 pypi nltk 3.8.1 pypi_0 pypi numpy 1.24.0 pypi_0 pypi nvidia-ammo 0.7.4 pypi_0 pypi nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi nvidia-cudnn-cu12 8.9.2.26 pypi_0 pypi nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi nvidia-curand-cu12 10.3.2.106 pypi_0 pypi nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi nvidia-nccl-cu12 2.18.1 pypi_0 pypi nvidia-nvjitlink-cu12 12.4.99 pypi_0 pypi nvidia-nvtx-cu12 12.1.105 pypi_0 pypi onnx 1.14.1 pypi_0 pypi onnx-graphsurgeon 0.3.27 pypi_0 pypi onnxruntime 1.16.3 pypi_0 pypi openai 1.14.2 pypi_0 pypi openh264 2.1.1 h4ff587b_0
openjpeg 2.4.0 h3ad879b_0
openssl 3.0.13 h7f8727e_0
optimum 1.17.1 pypi_0 pypi orjson 3.9.15 pypi_0 pypi packaging 24.0 pypi_0 pypi pandas 2.0.3 pypi_0 pypi pillow 10.2.0 py310h5eee18b_0
pip 23.3.1 py310h06a4308_0
polygraphy 0.49.0 pypi_0 pypi protobuf 5.26.0 pypi_0 pypi psutil 5.9.7 pypi_0 pypi py-cpuinfo 9.0.0 pypi_0 pypi pyarrow 15.0.2 pypi_0 pypi pyarrow-hotfix 0.6 pypi_0 pypi pydantic 2.3.0 pypi_0 pypi pydantic-core 2.6.3 pypi_0 pypi pydantic-settings 2.0.3 pypi_0 pypi pydub 0.25.1 pypi_0 pypi pygments 2.17.2 pypi_0 pypi pymilvus 2.3.0 pypi_0 pypi pynvml 11.5.0 pypi_0 pypi pyparsing 3.1.2 pypi_0 pypi pypdf 3.15.5 pypi_0 pypi pypdf2 3.0.1 pypi_0 pypi pyproject-hooks 1.0.0 pypi_0 pypi python 3.10.14 h955ad1f_0
python-dateutil 2.9.0.post0 pypi_0 pypi python-dotenv 1.0.1 pypi_0 pypi python-multipart 0.0.9 pypi_0 pypi pytorch-cuda 12.1 ha16c6d3_5 pytorch pytorch-mutex 1.0 cuda pytorch pytube 15.0.0 pypi_0 pypi pytz 2024.1 pypi_0 pypi pyyaml 6.0.1 py310h5eee18b_0
readline 8.2 h5eee18b_0
referencing 0.34.0 pypi_0 pypi regex 2023.12.25 pypi_0 pypi requests 2.31.0 py310h06a4308_1
responses 0.18.0 pypi_0 pypi rich 13.7.1 pypi_0 pypi rouge-score 0.1.2 pypi_0 pypi rpds-py 0.18.0 pypi_0 pypi safetensors 0.4.2 pypi_0 pypi scikit-learn 1.4.1.post1 pypi_0 pypi scipy 1.12.0 pypi_0 pypi semantic-version 2.10.0 pypi_0 pypi sentence-transformers 2.2.2 pypi_0 pypi sentencepiece 0.1.99 pypi_0 pypi setuptools 68.2.2 py310h06a4308_0
shellingham 1.5.4 pypi_0 pypi six 1.16.0 pypi_0 pypi sniffio 1.3.1 pypi_0 pypi soupsieve 2.5 pypi_0 pypi sqlalchemy 2.0.28 pypi_0 pypi sqlite 3.41.2 h5eee18b_0
starlette 0.36.3 pypi_0 pypi sympy 1.12 py310h06a4308_0
tbb 2021.8.0 hdb19cb5_0
tenacity 8.2.3 pypi_0 pypi tensorrt 9.2.0.post12.dev5 pypi_0 pypi tensorrt-bindings 9.2.0.post12.dev5 pypi_0 pypi tensorrt-libs 9.2.0.post12.dev5 pypi_0 pypi tensorrt-llm 0.7.1 pypi_0 pypi threadpoolctl 3.4.0 pypi_0 pypi tiktoken 0.3.3 pypi_0 pypi tk 8.6.12 h1ccaba5_0
tokenizers 0.13.4rc3 pypi_0 pypi tomli 2.0.1 pypi_0 pypi tomlkit 0.12.0 pypi_0 pypi toolz 0.12.1 pypi_0 pypi torch 2.1.2 pypi_0 pypi torchaudio 2.2.1 py310_cu121 pytorch torchvision 0.17.1 py310_cu121 pytorch tqdm 4.66.2 pypi_0 pypi transformers 4.33.1 pypi_0 pypi triton 2.1.0 pypi_0 pypi typer 0.9.0 pypi_0 pypi typing-inspect 0.9.0 pypi_0 pypi typing_extensions 4.9.0 py310h06a4308_1
tzdata 2024.1 pypi_0 pypi ujson 5.9.0 pypi_0 pypi urllib3 2.1.0 py310h06a4308_0
uvicorn 0.29.0 pypi_0 pypi websockets 11.0.3 pypi_0 pypi werkzeug 3.0.1 pypi_0 pypi wheel 0.41.2 py310h06a4308_0
wrapt 1.16.0 pypi_0 pypi xxhash 3.4.1 pypi_0 pypi xz 5.4.6 h5eee18b_0
yaml 0.2.5 h7b6447c_0
yarl 1.9.4 pypi_0 pypi youtube-transcript-api 0.6.2 pypi_0 pypi zipp 3.18.1 pypi_0 pypi zlib 1.2.13 h5eee18b_0
zstd 1.5.5 hc292b87_0

suggest me the solution......

Vishwa0703 commented 7 months ago

@sugar5727 If you have single 4090 then when you run build_llama.sh/build_mistral.sh it builds TensorRt serially right? Can you share the CPU and GPU usage while building the engine of llama/mistral. Because when I am running the build_mistral.sh in place of GPU CPU is being consumed attaching the screen shot

Screenshot from 2024-04-02 11-48-31