raise ValueError("Cannot detect local CUDA GPU target!")

AegeanYan commented 1 year ago

Hi, I'm new in your work and I've build the docker and tried to run python build.py command and receive this:

Traceback (most recent call last):
  File "/mlc_llm/build.py", line 4, in <module>
    main()
  File "/mlc_llm/mlc_llm/build.py", line 9, in main
    parsed_args = core._parse_args(parsed_args)  # pylint: disable=protected-access
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mlc_llm/mlc_llm/core.py", line 192, in _parse_args
    utils.parse_target(parsed)
  File "/mlc_llm/mlc_llm/utils.py", line 370, in parse_target
    raise ValueError("Cannot detect local CUDA GPU target!")

I'm not sure where goes wrong and I can see my GPUs using nvidia-smi. I'm so happy to try your work on accelerating the llama inference speed.

junrushao commented 1 year ago

Thanks for the question! In theory, local CUDA GPU arch is detected using the same API as nvidia-smi (i.e. cuda runtime). If this is the case, just wanted to check some environment setups.

you are using nvidia docker rather than plain docker
you are using the dockerfile with cuda 12.1, and all cuda distros are consistent 12.1 inside and outside docker right?
can you run nvidia-smi inside the docker image and report its output?
how about the outputs from “ldd build/mlc_chat_cli”? It should give how it’s linked to cuda runtime

AegeanYan commented 1 year ago

I'm sorry that I'm just using the plain docker, I'll try again.

AegeanYan commented 1 year ago

I've not got connection to my server manager to install nvidia-docker. But before that, could you please tell me whether your quantization method is possible for my to deploy llama-1 30B on 24G GPU? And whether it's possible for llama-2 70B on 2x24G GPUs?

junrushao commented 1 year ago

We are benchmarking on Llama2 so no guarantees for Llama1-30B (I suppose it should work), but Llama2-7B/13B should work out of box. Distributed inference is work in progress but around the horizon.

AegeanYan commented 1 year ago

Thx!

AegeanYan commented 1 year ago

@junrushao Hi, junru. I've check that.

I was actually using nvidia-docker instead of docker yesterday since the nvidia-docker is already installed.
I don't know how to check this one, but I'm just following the README


+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:05:00.0 Off |                  N/A |
| 31%   42C    P8    30W / 350W |    664MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:06:00.0 Off |                  N/A |
| 31%   42C    P8    23W / 350W |    664MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:45:00.0 Off |                  N/A |
| 75%   70C    P2   344W / 350W |  22470MiB / 24576MiB |     62%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  NVIDIA GeForce ...  Off  | 00000000:46:00.0 Off |                  N/A |
| 73%   69C    P2   345W / 350W |  23518MiB / 24576MiB |     65%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   4  NVIDIA GeForce ...  Off  | 00000000:85:00.0 Off |                  N/A |
| 30%   31C    P8    21W / 350W |      8MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   5  NVIDIA GeForce ...  Off  | 00000000:86:00.0 Off |                  N/A |
| 30%   28C    P8    23W / 350W |      8MiB / 24576MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   6  NVIDIA GeForce ...  Off  | 00000000:C5:00.0 Off |                  N/A |
| 99%   87C    P2   249W / 350W |  22454MiB / 24576MiB |     69%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   7  NVIDIA GeForce ...  Off  | 00000000:C6:00.0 Off |                  N/A |
| 63%   63C    P2   311W / 350W |  22604MiB / 24576MiB |     69%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+

5.

root@llm-perf:/mlc_llm# ldd build/mlc_chat_cli linux-vdso.so.1 (0x00007ffc212ce000) libmlc_llm.so => /mlc_llm/build/libmlc_llm.so (0x00007f1bfb25c000) libtvm_runtime.so => /mlc_llm/build/tvm/libtvm_runtime.so (0x00007f1bfb04c000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f1bfae1c000) libgcc_s.so.1 => /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f1bfadfc000) libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007f1bfabd4000) libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6 (0x00007f1bfaaeb000) /lib64/ld-linux-x86-64.so.2 (0x00007f1bfbad9000) libcudart.so.12 => /usr/local/cuda/lib64/libcudart.so.12 (0x00007f1bfa800000) libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007f1bf8b17000) libdl.so.2 => /usr/lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1bfaae6000) libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f1bfaae1000) librt.so.1 => /usr/lib/x86_64-linux-gnu/librt.so.1 (0x00007f1bfaada000)



could you help me to check what's going wrong?

mlc-ai / llm-perf-bench

raise ValueError("Cannot detect local CUDA GPU target!") #6