Open nihalkumar2k21 opened 7 months ago
I think you can try setting it to an empty dictionary like:
lora_config = LoraConfig.from_hf(args.hf_lora_dir, hf_modules_to_trtllm_modules, dict())
if you check LoraConfig class you can notice from_hf actually called init function and this argument default value is a empty dictionary.
You need use the tensorrt-llm==0.7.1
I think you can try setting it to an empty dictionary like:
lora_config = LoraConfig.from_hf(args.hf_lora_dir, hf_modules_to_trtllm_modules, dict())
if you check LoraConfig class you can notice from_hf actually called init function and this argument default value is a empty dictionary.
After setting an empty dict and running build.sh getting
(trtllm) vishwajeet@vishwa:~/Desktop/MYGPT/trt-llm-rag-linux$ bash build-llama.sh
[TensorRT-LLM] TensorRT-LLM version: 0.8.0[03/22/2024-19:03:13] [TRT-LLM] [I] Serially build TensorRT engines.
[03/22/2024-19:03:15] [TRT] [I] [MemUsageChange] Init CUDA: CPU +4032, GPU +0, now: CPU 5647, GPU 1383 (MiB)
[03/22/2024-19:03:16] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1798, GPU +316, now: CPU 7581, GPU 1699 (MiB)
[03/22/2024-19:03:16] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading
[03/22/2024-19:03:16] [TRT-LLM] [W] Invalid timing cache, using freshly created one
[03/22/2024-19:03:17] [TRT-LLM] [I] [MemUsage] Rank 0 Engine build starts - Allocated Memory: Host 8.5100 (GiB) Device 1.6595 (GiB)
Traceback (most recent call last):
File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 908, in
I think you can try setting it to an empty dictionary like:
lora_config = LoraConfig.from_hf(args.hf_lora_dir, hf_modules_to_trtllm_modules, dict())
if you check LoraConfig class you can notice from_hf actually called init function and this argument default value is a empty dictionary.After setting an empty dict and running build.sh getting
(trtllm) vishwajeet@vishwa:~/Desktop/MYGPT/trt-llm-rag-linux$ bash build-llama.sh [TensorRT-LLM] TensorRT-LLM version: 0.8.0[03/22/2024-19:03:13] [TRT-LLM] [I] Serially build TensorRT engines. [03/22/2024-19:03:15] [TRT] [I] [MemUsageChange] Init CUDA: CPU +4032, GPU +0, now: CPU 5647, GPU 1383 (MiB) [03/22/2024-19:03:16] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1798, GPU +316, now: CPU 7581, GPU 1699 (MiB) [03/22/2024-19:03:16] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading [03/22/2024-19:03:16] [TRT-LLM] [W] Invalid timing cache, using freshly created one [03/22/2024-19:03:17] [TRT-LLM] [I] [MemUsage] Rank 0 Engine build starts - Allocated Memory: Host 8.5100 (GiB) Device 1.6595 (GiB) Traceback (most recent call last): File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 908, in build(0, args) File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 852, in build engine = build_rank_engine(builder, builder_config, engine_name, File "/home/vishwajeet/Desktop/MYGPT/trt-llm-rag-linux/build.py", line 613, in build_rank_engine tensorrt_llm_llama = tensorrt_llm.models.LLaMAForCausalLM( File "/home/vishwajeet/miniconda3/envs/trtllm/lib/python3.10/site-packages/tensorrt_llm/models/modeling_utils.py", line 284, in call obj = type.call(cls, *args, kwargs) TypeError: LLaMAForCausalLM.init**() got an unexpected keyword argument 'num_layers'
same things I met,so you can try to install tensorrt-llm==0.7.1
@sugar5727 Downgraded to tensorrt-llm==0.7.1 and now I am not facing those issues and I have RTX4060 Laptop 8gb when I run build-llama.sh it starts but gets killed
(trtllm) vishwajeet@vishwa:~/Desktop/MYGPT/trt-llm-rag-linux$ bash build-llama.sh [03/22/2024-19:41:34] [TRT-LLM] [I] Serially build TensorRT engines. [03/22/2024-19:41:36] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2991, GPU +0, now: CPU 4121, GPU 1039 (MiB) [03/22/2024-19:41:37] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1798, GPU +314, now: CPU 6055, GPU 1353 (MiB) [03/22/2024-19:41:37] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading [03/22/2024-19:41:37] [TRT-LLM] [W] Invalid timing cache, using freshly created one [03/22/2024-19:41:38] [TRT-LLM] [I] [MemUsage] Rank 0 Engine build starts - Allocated Memory: Host 7.1123 (GiB) Device 1.3216 (GiB) build-llama.sh: line 1: 41317 Killed python build.py --model_dir './model/llama/llama13_hf' --quant_ckpt_path './model/llama/llama13_int4_awq_weights/llama_tp1_rank0.npz' --dtype float16 --remove_input_padding --use_gpt_attention_plugin float16 --enable_context_fmha --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --output_dir './model/llama/llama13_int4_engine' --world_size 1 --tp_size 1 --parallel_build --max_input_len 3900 --max_batch_size 1 --max_output_len 1024
@sugar5727 Downgraded to tensorrt-llm==0.7.1 and now I am not facing those issues and I have RTX4060 Laptop 8gb when I run build-llama.sh it starts but gets killed
(trtllm) vishwajeet@vishwa:~/Desktop/MYGPT/trt-llm-rag-linux$ bash build-llama.sh [03/22/2024-19:41:34] [TRT-LLM] [I] Serially build TensorRT engines. [03/22/2024-19:41:36] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2991, GPU +0, now: CPU 4121, GPU 1039 (MiB) [03/22/2024-19:41:37] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1798, GPU +314, now: CPU 6055, GPU 1353 (MiB) [03/22/2024-19:41:37] [TRT] [W] CUDA lazy loading is not enabled. Enabling it can significantly reduce device memory usage and speed up TensorRT initialization. See "Lazy Loading" section of CUDA documentation https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#lazy-loading [03/22/2024-19:41:37] [TRT-LLM] [W] Invalid timing cache, using freshly created one [03/22/2024-19:41:38] [TRT-LLM] [I] [MemUsage] Rank 0 Engine build starts - Allocated Memory: Host 7.1123 (GiB) Device 1.3216 (GiB) build-llama.sh: line 1: 41317 Killed python build.py --model_dir './model/llama/llama13_hf' --quant_ckpt_path './model/llama/llama13_int4_awq_weights/llama_tp1_rank0.npz' --dtype float16 --remove_input_padding --use_gpt_attention_plugin float16 --enable_context_fmha --use_gemm_plugin float16 --use_weight_only --weight_only_precision int4_awq --per_group --output_dir './model/llama/llama13_int4_engine' --world_size 1 --tp_size 1 --parallel_build --max_input_len 3900 --max_batch_size 1 --max_output_len 1024
sry, I dont face it before
@sugar5727 which gpu you have?
@sugar5727 which gpu you have?
RTX 4090
(mlr_chat) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ ./build-mistral.sh You are using a model of type mistral to instantiate a model of type llama. This is not supported for all configurations of models and can yield errors. [TensorRT-LLM] TensorRT-LLM version: 0.8.0Traceback (most recent call last): File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 895, in args = parse_arguments() File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 549, in parse_arguments lora_config = LoraConfig.from_hf(args.hf_lora_dir, TypeError: LoraConfig.from_hf() missing 1 required positional argument: 'trtllm_modules_to_hf_modules' (mlr_chat) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ ./build-llama.sh [TensorRT-LLM] TensorRT-LLM version: 0.8.0Traceback (most recent call last): File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 895, in args = parse_arguments() File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 549, in parse_arguments lora_config = LoraConfig.from_hf(args.hf_lora_dir, TypeError: LoraConfig.from_hf() missing 1 required positional argument: 'trtllm_modules_to_hf_modules'
pip uninstall tensorrt_llm
then re-install
pip3 install tensorrt_llm==0.7.1 -U --pre --extra-index-url https://pypi.nvidia.com --log=debug.txt
new error:
(trtllm) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ python3 app.py Invalid MIT-MAGIC-COOKIE-1 key[anil-gpu2:45735] Process received signal [anil-gpu2:45735] Signal: Segmentation fault (11) [anil-gpu2:45735] Signal code: Address not mapped (1) [anil-gpu2:45735] Failing at address: 0x440000e9 [anil-gpu2:45735] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420)[0x7f206b1b2420] [anil-gpu2:45735] [ 1] /lib/x86_64-linux-gnu/libmpi.so.40(PMPI_Comm_set_errhandler+0x47)[0x7f1e0f681fc7] [anil-gpu2:45735] [ 2] /home/anil/miniconda3/envs/trtllm/lib/python3.10/site-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0x9abf0)[0x7f1dea220bf0] [anil-gpu2:45735] [ 3] /home/anil/miniconda3/envs/trtllm/lib/python3.10/site-packages/mpi4py/MPI.cpython-310-x86_64-linux-gnu.so(+0x2decf)[0x7f1dea1b3ecf] [anil-gpu2:45735] [ 4] python3(PyModule_ExecDef+0x70)[0x597d40] [anil-gpu2:45735] [ 5] python3[0x5990c9] [anil-gpu2:45735] [ 6] python3[0x4fd37b] [anil-gpu2:45735] [ 7] python3(_PyEval_EvalFrameDefault+0x5a74)[0x4f37a4] [anil-gpu2:45735] [ 8] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f] [anil-gpu2:45735] [ 9] python3(_PyEval_EvalFrameDefault+0x4b26)[0x4f2856] [anil-gpu2:45735] [10] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f] [anil-gpu2:45735] [11] python3(_PyEval_EvalFrameDefault+0x731)[0x4ee461] [anil-gpu2:45735] [12] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f] [anil-gpu2:45735] [13] python3(_PyEval_EvalFrameDefault+0x31f)[0x4ee04f] [anil-gpu2:45735] [14] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f] [anil-gpu2:45735] [15] python3(_PyEval_EvalFrameDefault+0x31f)[0x4ee04f] [anil-gpu2:45735] [16] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f] [anil-gpu2:45735] [17] python3[0x4fd514] [anil-gpu2:45735] [18] python3(_PyObject_CallMethodIdObjArgs+0x137)[0x50c327] [anil-gpu2:45735] [19] python3(PyImport_ImportModuleLevelObject+0x525)[0x50b685] [anil-gpu2:45735] [20] python3[0x517454] [anil-gpu2:45735] [21] python3[0x4fd907] [anil-gpu2:45735] [22] python3(PyObject_Call+0x209)[0x50a259] [anil-gpu2:45735] [23] python3(_PyEval_EvalFrameDefault+0x5a74)[0x4f37a4] [anil-gpu2:45735] [24] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f] [anil-gpu2:45735] [25] python3(_PyEval_EvalFrameDefault+0x31f)[0x4ee04f] [anil-gpu2:45735] [26] python3(_PyFunction_Vectorcall+0x6f)[0x4fdd4f] [anil-gpu2:45735] [27] python3[0x4fd514] [anil-gpu2:45735] [28] python3(_PyObject_CallMethodIdObjArgs+0x137)[0x50c327] [anil-gpu2:45735] [29] python3(PyImport_ImportModuleLevelObject+0x9da)[0x50bb3a] [anil-gpu2:45735] End of error message Segmentation fault (core dumped)
(trtllm) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ conda list
#
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
absl-py 2.1.0 pypi_0 pypi
accelerate 0.20.3 pypi_0 pypi
aiofiles 23.2.1 pypi_0 pypi
aiohttp 3.9.3 pypi_0 pypi
aiosignal 1.3.1 pypi_0 pypi
alembic 1.13.1 pypi_0 pypi
altair 5.2.0 pypi_0 pypi
annotated-types 0.6.0 pypi_0 pypi
anyio 3.7.1 pypi_0 pypi
async-timeout 4.0.3 pypi_0 pypi
attrs 23.2.0 pypi_0 pypi
beautifulsoup4 4.12.3 pypi_0 pypi
blas 1.0 mkl
build 1.1.1 pypi_0 pypi
bzip2 1.0.8 h5eee18b_5
ca-certificates 2024.2.2 hbcca054_0 conda-forge
certifi 2024.2.2 pyhd8ed1ab_0 conda-forge
charset-normalizer 2.0.4 pyhd3eb1b0_0
click 8.1.7 pypi_0 pypi
colorama 0.4.6 pypi_0 pypi
colored 2.2.4 pypi_0 pypi
coloredlogs 15.0.1 pypi_0 pypi
contourpy 1.2.0 pypi_0 pypi
ctransformers 0.2.26 pypi_0 pypi
cuda-cudart 12.1.105 0 nvidia
cuda-cupti 12.1.105 0 nvidia
cuda-libraries 12.1.0 0 nvidia
cuda-nvrtc 12.1.105 0 nvidia
cuda-nvtx 12.1.105 0 nvidia
cuda-opencl 12.4.99 0 nvidia
cuda-python 12.2.0 pypi_0 pypi
cuda-runtime 12.1.0 0 nvidia
cycler 0.12.1 pypi_0 pypi
cython 3.0.9 pypi_0 pypi
dataclasses-json 0.6.4 pypi_0 pypi
datasets 2.14.6 pypi_0 pypi
deprecated 1.2.14 pypi_0 pypi
diffusers 0.15.0 pypi_0 pypi
dill 0.3.7 pypi_0 pypi
distro 1.9.0 pypi_0 pypi
docx2txt 0.8 pypi_0 pypi
environs 9.5.0 pypi_0 pypi
evaluate 0.4.1 pypi_0 pypi
exceptiongroup 1.2.0 pypi_0 pypi
faiss-cpu 1.7.4 pypi_0 pypi
fastapi 0.110.0 pypi_0 pypi
ffmpeg 4.3 hf484d3e_0 pytorch
ffmpy 0.3.2 pypi_0 pypi
filelock 3.13.1 py310h06a4308_0
flask 2.2.3 pypi_0 pypi
flask-marshmallow 0.15.0 pypi_0 pypi
flask-migrate 4.0.4 pypi_0 pypi
flask-sqlalchemy 3.0.3 pypi_0 pypi
flatbuffers 24.3.7 pypi_0 pypi
fonttools 4.50.0 pypi_0 pypi
freetype 2.12.1 h4a9f257_0
frozenlist 1.4.1 pypi_0 pypi
fsspec 2023.10.0 pypi_0 pypi
gmp 6.2.1 h295c915_3
gmpy2 2.1.2 py310heeb90bb_0
gnutls 3.6.15 he1e5248_0
gradio 4.14.0 pypi_0 pypi
gradio-client 0.8.0 pypi_0 pypi
greenlet 3.0.3 pypi_0 pypi
grpcio 1.56.0 pypi_0 pypi
h11 0.14.0 pypi_0 pypi
httpcore 1.0.4 pypi_0 pypi
httpx 0.27.0 pypi_0 pypi
huggingface-hub 0.21.4 pypi_0 pypi
humanfriendly 10.0 pypi_0 pypi
idna 3.4 py310h06a4308_0
importlib-metadata 7.1.0 pypi_0 pypi
importlib-resources 6.4.0 pypi_0 pypi
intel-openmp 2023.1.0 hdb19cb5_46306
itsdangerous 2.1.2 pypi_0 pypi
janus 1.0.0 pypi_0 pypi
jinja2 3.1.3 py310h06a4308_0
joblib 1.3.2 pypi_0 pypi
jpeg 9e h5eee18b_1
jsonpatch 1.33 pypi_0 pypi
jsonpointer 2.4 pypi_0 pypi
jsonschema 4.21.1 pypi_0 pypi
jsonschema-specifications 2023.12.1 pypi_0 pypi
kiwisolver 1.4.5 pypi_0 pypi
lame 3.100 h7b6447c_0
langchain 0.0.310 pypi_0 pypi
langsmith 0.0.43 pypi_0 pypi
lark 1.1.9 pypi_0 pypi
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.38 h1181459_1
lerc 3.0 h295c915_0
libcublas 12.1.0.26 0 nvidia
libcufft 11.0.2.4 0 nvidia
libcufile 1.9.0.20 0 nvidia
libcurand 10.3.5.119 0 nvidia
libcusolver 11.4.4.55 0 nvidia
libcusparse 12.0.2.55 0 nvidia
libdeflate 1.17 h5eee18b_1
libffi 3.4.4 h6a678d5_0
libgcc-ng 11.2.0 h1234567_1
libgfortran-ng 7.5.0 h14aa051_20 conda-forge
libgfortran4 7.5.0 h14aa051_20 conda-forge
libgomp 11.2.0 h1234567_1
libiconv 1.16 h7f8727e_2
libidn2 2.3.4 h5eee18b_0
libjpeg-turbo 2.0.0 h9bf148f_0 pytorch
libnpp 12.0.2.50 0 nvidia
libnvjitlink 12.1.105 0 nvidia
libnvjpeg 12.1.1.14 0 nvidia
libpng 1.6.39 h5eee18b_0
libstdcxx-ng 11.2.0 h1234567_1
libtasn1 4.19.0 h5eee18b_0
libtiff 4.5.1 h6a678d5_0
libunistring 0.9.10 h27cfd23_0
libuuid 1.41.5 h5eee18b_0
libwebp-base 1.3.2 h5eee18b_0
llama-index 0.9.27 pypi_0 pypi
llvm-openmp 14.0.6 h9e868ea_0
lz4-c 1.9.4 h6a678d5_0
mako 1.3.2 pypi_0 pypi
markdown-it-py 3.0.0 pypi_0 pypi
markupsafe 2.1.3 py310h5eee18b_0
marshmallow 3.21.1 pypi_0 pypi
matplotlib 3.8.3 pypi_0 pypi
mdurl 0.1.2 pypi_0 pypi
mkl 2023.1.0 h213fc3f_46344
mkl-service 2.4.0 py310h5eee18b_1
mkl_fft 1.3.8 py310h5eee18b_0
mkl_random 1.2.4 py310hdb19cb5_0
mpc 1.1.0 h10f8cd9_1
mpfr 4.0.2 hb69a4c5_1
mpi 1.0 mpich conda-forge
mpi4py 3.1.4 py310hfc96bbd_0
mpich 3.3.2 hc856adb_0
mpmath 1.3.0 py310h06a4308_0
multidict 6.0.5 pypi_0 pypi
multiprocess 0.70.15 pypi_0 pypi
mypy-extensions 1.0.0 pypi_0 pypi
ncurses 6.4 h6a678d5_0
nest-asyncio 1.6.0 pypi_0 pypi
nettle 3.7.3 hbbd107a_1
networkx 3.1 py310h06a4308_0
ninja 1.11.1.1 pypi_0 pypi
nltk 3.8.1 pypi_0 pypi
numpy 1.24.0 pypi_0 pypi
nvidia-ammo 0.7.4 pypi_0 pypi
nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi
nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi
nvidia-cudnn-cu12 8.9.2.26 pypi_0 pypi
nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi
nvidia-curand-cu12 10.3.2.106 pypi_0 pypi
nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi
nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi
nvidia-nccl-cu12 2.18.1 pypi_0 pypi
nvidia-nvjitlink-cu12 12.4.99 pypi_0 pypi
nvidia-nvtx-cu12 12.1.105 pypi_0 pypi
onnx 1.14.1 pypi_0 pypi
onnx-graphsurgeon 0.3.27 pypi_0 pypi
onnxruntime 1.16.3 pypi_0 pypi
openai 1.14.2 pypi_0 pypi
openh264 2.1.1 h4ff587b_0
openjpeg 2.4.0 h3ad879b_0
openssl 3.0.13 h7f8727e_0
optimum 1.17.1 pypi_0 pypi
orjson 3.9.15 pypi_0 pypi
packaging 24.0 pypi_0 pypi
pandas 2.0.3 pypi_0 pypi
pillow 10.2.0 py310h5eee18b_0
pip 23.3.1 py310h06a4308_0
polygraphy 0.49.0 pypi_0 pypi
protobuf 5.26.0 pypi_0 pypi
psutil 5.9.7 pypi_0 pypi
py-cpuinfo 9.0.0 pypi_0 pypi
pyarrow 15.0.2 pypi_0 pypi
pyarrow-hotfix 0.6 pypi_0 pypi
pydantic 2.3.0 pypi_0 pypi
pydantic-core 2.6.3 pypi_0 pypi
pydantic-settings 2.0.3 pypi_0 pypi
pydub 0.25.1 pypi_0 pypi
pygments 2.17.2 pypi_0 pypi
pymilvus 2.3.0 pypi_0 pypi
pynvml 11.5.0 pypi_0 pypi
pyparsing 3.1.2 pypi_0 pypi
pypdf 3.15.5 pypi_0 pypi
pypdf2 3.0.1 pypi_0 pypi
pyproject-hooks 1.0.0 pypi_0 pypi
python 3.10.14 h955ad1f_0
python-dateutil 2.9.0.post0 pypi_0 pypi
python-dotenv 1.0.1 pypi_0 pypi
python-multipart 0.0.9 pypi_0 pypi
pytorch-cuda 12.1 ha16c6d3_5 pytorch
pytorch-mutex 1.0 cuda pytorch
pytube 15.0.0 pypi_0 pypi
pytz 2024.1 pypi_0 pypi
pyyaml 6.0.1 py310h5eee18b_0
readline 8.2 h5eee18b_0
referencing 0.34.0 pypi_0 pypi
regex 2023.12.25 pypi_0 pypi
requests 2.31.0 py310h06a4308_1
responses 0.18.0 pypi_0 pypi
rich 13.7.1 pypi_0 pypi
rouge-score 0.1.2 pypi_0 pypi
rpds-py 0.18.0 pypi_0 pypi
safetensors 0.4.2 pypi_0 pypi
scikit-learn 1.4.1.post1 pypi_0 pypi
scipy 1.12.0 pypi_0 pypi
semantic-version 2.10.0 pypi_0 pypi
sentence-transformers 2.2.2 pypi_0 pypi
sentencepiece 0.1.99 pypi_0 pypi
setuptools 68.2.2 py310h06a4308_0
shellingham 1.5.4 pypi_0 pypi
six 1.16.0 pypi_0 pypi
sniffio 1.3.1 pypi_0 pypi
soupsieve 2.5 pypi_0 pypi
sqlalchemy 2.0.28 pypi_0 pypi
sqlite 3.41.2 h5eee18b_0
starlette 0.36.3 pypi_0 pypi
sympy 1.12 py310h06a4308_0
tbb 2021.8.0 hdb19cb5_0
tenacity 8.2.3 pypi_0 pypi
tensorrt 9.2.0.post12.dev5 pypi_0 pypi
tensorrt-bindings 9.2.0.post12.dev5 pypi_0 pypi
tensorrt-libs 9.2.0.post12.dev5 pypi_0 pypi
tensorrt-llm 0.7.1 pypi_0 pypi
threadpoolctl 3.4.0 pypi_0 pypi
tiktoken 0.3.3 pypi_0 pypi
tk 8.6.12 h1ccaba5_0
tokenizers 0.13.4rc3 pypi_0 pypi
tomli 2.0.1 pypi_0 pypi
tomlkit 0.12.0 pypi_0 pypi
toolz 0.12.1 pypi_0 pypi
torch 2.1.2 pypi_0 pypi
torchaudio 2.2.1 py310_cu121 pytorch
torchvision 0.17.1 py310_cu121 pytorch
tqdm 4.66.2 pypi_0 pypi
transformers 4.33.1 pypi_0 pypi
triton 2.1.0 pypi_0 pypi
typer 0.9.0 pypi_0 pypi
typing-inspect 0.9.0 pypi_0 pypi
typing_extensions 4.9.0 py310h06a4308_1
tzdata 2024.1 pypi_0 pypi
ujson 5.9.0 pypi_0 pypi
urllib3 2.1.0 py310h06a4308_0
uvicorn 0.29.0 pypi_0 pypi
websockets 11.0.3 pypi_0 pypi
werkzeug 3.0.1 pypi_0 pypi
wheel 0.41.2 py310h06a4308_0
wrapt 1.16.0 pypi_0 pypi
xxhash 3.4.1 pypi_0 pypi
xz 5.4.6 h5eee18b_0
yaml 0.2.5 h7b6447c_0
yarl 1.9.4 pypi_0 pypi
youtube-transcript-api 0.6.2 pypi_0 pypi
zipp 3.18.1 pypi_0 pypi
zlib 1.2.13 h5eee18b_0
zstd 1.5.5 hc292b87_0
suggest me the solution......
@sugar5727 If you have single 4090 then when you run build_llama.sh/build_mistral.sh it builds TensorRt serially right? Can you share the CPU and GPU usage while building the engine of llama/mistral. Because when I am running the build_mistral.sh in place of GPU CPU is being consumed attaching the screen shot
(mlr_chat) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ ./build-mistral.sh You are using a model of type mistral to instantiate a model of type llama. This is not supported for all configurations of models and can yield errors. [TensorRT-LLM] TensorRT-LLM version: 0.8.0Traceback (most recent call last): File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 895, in
args = parse_arguments()
File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 549, in parse_arguments
lora_config = LoraConfig.from_hf(args.hf_lora_dir,
TypeError: LoraConfig.from_hf() missing 1 required positional argument: 'trtllm_modules_to_hf_modules'
(mlr_chat) anil@anil-gpu2:/media/anil/New Volume/nihal/mlr_chat$ ./build-llama.sh
[TensorRT-LLM] TensorRT-LLM version: 0.8.0Traceback (most recent call last):
File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 895, in
args = parse_arguments()
File "/media/anil/New Volume/nihal/mlr_chat/build.py", line 549, in parse_arguments
lora_config = LoraConfig.from_hf(args.hf_lora_dir,
TypeError: LoraConfig.from_hf() missing 1 required positional argument: 'trtllm_modules_to_hf_modules'