mlc-ai / llm-perf-bench

Apache License 2.0
109 stars 12 forks source link

raise ValueError("Cannot detect local CUDA GPU target!") #6

Closed AegeanYan closed 1 year ago

AegeanYan commented 1 year ago

Hi, I'm new in your work and I've build the docker and tried to run python build.py command and receive this:

Traceback (most recent call last):
  File "/mlc_llm/build.py", line 4, in <module>
    main()
  File "/mlc_llm/mlc_llm/build.py", line 9, in main
    parsed_args = core._parse_args(parsed_args)  # pylint: disable=protected-access
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mlc_llm/mlc_llm/core.py", line 192, in _parse_args
    utils.parse_target(parsed)
  File "/mlc_llm/mlc_llm/utils.py", line 370, in parse_target
    raise ValueError("Cannot detect local CUDA GPU target!")

I'm not sure where goes wrong and I can see my GPUs using nvidia-smi. I'm so happy to try your work on accelerating the llama inference speed.

junrushao commented 1 year ago

Thanks for the question! In theory, local CUDA GPU arch is detected using the same API as nvidia-smi (i.e. cuda runtime). If this is the case, just wanted to check some environment setups.

AegeanYan commented 1 year ago

I'm sorry that I'm just using the plain docker, I'll try again.

AegeanYan commented 1 year ago

I've not got connection to my server manager to install nvidia-docker. But before that, could you please tell me whether your quantization method is possible for my to deploy llama-1 30B on 24G GPU? And whether it's possible for llama-2 70B on 2x24G GPUs?

junrushao commented 1 year ago

We are benchmarking on Llama2 so no guarantees for Llama1-30B (I suppose it should work), but Llama2-7B/13B should work out of box. Distributed inference is work in progress but around the horizon.

AegeanYan commented 1 year ago

Thx!

AegeanYan commented 1 year ago

@junrushao Hi, junru. I've check that.

  1. I was actually using nvidia-docker instead of docker yesterday since the nvidia-docker is already installed.
  2. I don't know how to check this one, but I'm just following the README
  3. 
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.1     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVIDIA GeForce ...  Off  | 00000000:05:00.0 Off |                  N/A |
    | 31%   42C    P8    30W / 350W |    664MiB / 24576MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    |   1  NVIDIA GeForce ...  Off  | 00000000:06:00.0 Off |                  N/A |
    | 31%   42C    P8    23W / 350W |    664MiB / 24576MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    |   2  NVIDIA GeForce ...  Off  | 00000000:45:00.0 Off |                  N/A |
    | 75%   70C    P2   344W / 350W |  22470MiB / 24576MiB |     62%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    |   3  NVIDIA GeForce ...  Off  | 00000000:46:00.0 Off |                  N/A |
    | 73%   69C    P2   345W / 350W |  23518MiB / 24576MiB |     65%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    |   4  NVIDIA GeForce ...  Off  | 00000000:85:00.0 Off |                  N/A |
    | 30%   31C    P8    21W / 350W |      8MiB / 24576MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    |   5  NVIDIA GeForce ...  Off  | 00000000:86:00.0 Off |                  N/A |
    | 30%   28C    P8    23W / 350W |      8MiB / 24576MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    |   6  NVIDIA GeForce ...  Off  | 00000000:C5:00.0 Off |                  N/A |
    | 99%   87C    P2   249W / 350W |  22454MiB / 24576MiB |     69%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    |   7  NVIDIA GeForce ...  Off  | 00000000:C6:00.0 Off |                  N/A |
    | 63%   63C    P2   311W / 350W |  22604MiB / 24576MiB |     69%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+

5.

root@llm-perf:/mlc_llm# ldd build/mlc_chat_cli linux-vdso.so.1 (0x00007ffc212ce000) libmlc_llm.so => /mlc_llm/build/libmlc_llm.so (0x00007f1bfb25c000) libtvm_runtime.so => /mlc_llm/build/tvm/libtvm_runtime.so (0x00007f1bfb04c000) libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f1bfae1c000) libgcc_s.so.1 => /usr/lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f1bfadfc000) libc.so.6 => /usr/lib/x86_64-linux-gnu/libc.so.6 (0x00007f1bfabd4000) libm.so.6 => /usr/lib/x86_64-linux-gnu/libm.so.6 (0x00007f1bfaaeb000) /lib64/ld-linux-x86-64.so.2 (0x00007f1bfbad9000) libcudart.so.12 => /usr/local/cuda/lib64/libcudart.so.12 (0x00007f1bfa800000) libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007f1bf8b17000) libdl.so.2 => /usr/lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1bfaae6000) libpthread.so.0 => /usr/lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f1bfaae1000) librt.so.1 => /usr/lib/x86_64-linux-gnu/librt.so.1 (0x00007f1bfaada000)



could you help me to check what's going wrong?