vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://docs.vllm.ai
Apache License 2.0
30.37k stars 4.6k forks source link

[Feature]: Support for Higher than 64 LoRa Ranks #3934

Open MrzEsma opened 7 months ago

MrzEsma commented 7 months ago

🚀 The feature, motivation and pitch

Hello,

I was delighted to see the implementation of the multi LoRa feature and would like to express my gratitude and appreciation for your efforts. However, the LoRa we have developed operates with r=128 and r=256, and currently, it does not work for me, resulting in the following error:

ValueError: max_lora_rank (128) must be one of (8, 16, 32, 64).

I am curious to know if there are any plans to support higher ranks? Is this on the priority list? It's quite crucial for us, and we would greatly appreciate any development in this area.

Thank you.

Alternatives

No response

Additional context

No response

jeejeelee commented 7 months ago

In my personal understanding, you can quikly support rank=128 in the following way:

  1. clone source from this repo
  2. add FOR_BGMV_WIDE(f, in_T, out_T, W_T, 128) bgmv_config.h
  3. modify lora_config
  4. build from source using VLLM_INSTALL_PUNICA_KERNELS=1 pip install -e . Similarly, the same applies for rank=256.

If testing with rank=128 and rank=256 shows no issues, I feel like you can submit a PR to promote support for this feature.

MrzEsma commented 7 months ago

I do it but get an error while build from source using VLLM_INSTALL_PUNICA_KERNELS=1 pip install -e . .

My change in code:

But get this error:

`Building wheels for collected packages: vllm Building editable for vllm (pyproject.toml) ... error error: subprocess-exited-with-error

× Building editable for vllm (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> [177 lines of output] /tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at ../torch/csrc/utils/tensor_numpy.cpp:84.) device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'), running editable_wheel creating /tmp/pip-wheel-j50wwcu7/.tmp-t81pbbme/vllm.egg-info writing /tmp/pip-wheel-j50wwcu7/.tmp-t81pbbme/vllm.egg-info/PKG-INFO writing dependency_links to /tmp/pip-wheel-j50wwcu7/.tmp-t81pbbme/vllm.egg-info/dependency_links.txt writing requirements to /tmp/pip-wheel-j50wwcu7/.tmp-t81pbbme/vllm.egg-info/requires.txt writing top-level names to /tmp/pip-wheel-j50wwcu7/.tmp-t81pbbme/vllm.egg-info/top_level.txt writing manifest file '/tmp/pip-wheel-j50wwcu7/.tmp-t81pbbme/vllm.egg-info/SOURCES.txt' reading manifest file '/tmp/pip-wheel-j50wwcu7/.tmp-t81pbbme/vllm.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' adding license file 'LICENSE' writing manifest file '/tmp/pip-wheel-j50wwcu7/.tmp-t81pbbme/vllm.egg-info/SOURCES.txt' creating '/tmp/pip-wheel-j50wwcu7/.tmp-t81pbbme/vllm-0.4.0.post1+cu115.dist-info' creating /tmp/pip-wheel-j50wwcu7/.tmp-t81pbbme/vllm-0.4.0.post1+cu115.dist-info/WHEEL running build_py running build_ext -- The CXX compiler identification is GNU 11.4.0 -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: /usr/bin/c++ - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- Build type: RelWithDebInfo -- Target device: cuda -- Found Python: /usr/bin/python3 (found version "3.10.12") found components: Interpreter Development.Module -- Found python matching: /usr/bin/python3. -- Found CUDA: /usr (found version "11.5") -- The CUDA compiler identification is NVIDIA 11.5.119 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: /usr/bin/nvcc - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- Found CUDAToolkit: /usr/include (found version "11.5.119") -- Performing Test CMAKE_HAVE_LIBC_PTHREAD -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success -- Found Threads: TRUE -- Caffe2: CUDA detected: 11.5 -- Caffe2: CUDA nvcc is: /usr/bin/nvcc -- Caffe2: CUDA toolkit directory: /usr -- Caffe2: Header version is: 11.5 -- /usr/lib/x86_64-linux-gnu/libnvrtc.so shorthash is 65f2c18b -- USE_CUDNN is set to 0. Compiling without cuDNN support -- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support -- Autodetected CUDA architecture(s): 8.6 -- Added CUDA NVCC flags for: -gencode;arch=compute_86,code=sm_86 CMake Warning at /tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message): static library kineto_LIBRARY-NOTFOUND not found. Call Stack (most recent call first): /tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/torch/share/cmake/Torch/TorchConfig.cmake:127 (append_torchlib_if_found) CMakeLists.txt:67 (find_package)

  -- Found Torch: /tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/torch/lib/libtorch.so
  -- CUDA supported arches: 7.0;7.5;8.0;8.6;8.9;9.0
  -- CUDA target arches: 86-real
  -- Punica target arches: 86-real
  -- Enabling C extension.
  -- Enabling moe extension.
  -- Enabling punica extension.
  -- Configuring done (5.7s)
  -- Generating done (0.0s)
  -- Build files have been written to: /tmp/tmpc8so895b.build-temp
  [1/3] Building CXX object CMakeFiles/_moe_C.dir/csrc/moe/moe_ops.cpp.o
  [2/3] Building CUDA object CMakeFiles/_moe_C.dir/csrc/moe/topk_softmax_kernels.cu.o
  FAILED: CMakeFiles/_moe_C.dir/csrc/moe/topk_softmax_kernels.cu.o
  /usr/bin/nvcc -forward-unknown-to-host-compiler -DTORCH_EXTENSION_NAME=_moe_C -DUSE_C10D_GLOO -DUSE_C10D_NCCL -DUSE_DISTRIBUTED -DUSE_RPC -DUSE_TENSORPIPE -D_moe_C_EXPORTS -I/home/basalam1676/Desktop/projects/test_vllm/vllm/csrc -isystem /usr/include/python3.10 -isystem /tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/torch/include -isystem /tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -DONNX_NAMESPACE=onnx_c2 -Xcudafe --diag_suppress=cc_clobber_ignored,--diag_suppress=field_without_dll_interface,--diag_suppress=base_class_has_different_dll_interface,--diag_suppress=dll_interface_conflict_none_assumed,--diag_suppress=dll_interface_conflict_dllexport_assumed,--diag_suppress=bad_friend_decl --expt-relaxed-constexpr --expt-extended-lambda -O2 -g -DNDEBUG -std=c++17 "--generate-code=arch=compute_86,code=[sm_86]" -Xcompiler=-fPIC -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --threads=1 -D_GLIBCXX_USE_CXX11_ABI=0 -MD -MT CMakeFiles/_moe_C.dir/csrc/moe/topk_softmax_kernels.cu.o -MF CMakeFiles/_moe_C.dir/csrc/moe/topk_softmax_kernels.cu.o.d -x cu -c /home/basalam1676/Desktop/projects/test_vllm/vllm/csrc/moe/topk_softmax_kernels.cu -o CMakeFiles/_moe_C.dir/csrc/moe/topk_softmax_kernels.cu.o
  /usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’:
    435 |         function(_Functor&& __f)
        |                                                                                                                                                 ^
  /usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
  /usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
    530 |         operator=(_Functor&& __f)
        |                                                                                                                                                  ^
  /usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’
  ninja: build stopped: subcommand failed.
  Traceback (most recent call last):
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/command/editable_wheel.py", line 155, in run
      self._create_wheel_file(bdist_wheel)
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/command/editable_wheel.py", line 357, in _create_wheel_file
      files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/command/editable_wheel.py", line 280, in _run_build_commands
      self._run_build_subcommands()
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/command/editable_wheel.py", line 307, in _run_build_subcommands
      self.run_command(name)
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/dist.py", line 967, in run_command
      super().run_command(command)
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 91, in run
      _build_ext.run(self)
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
      self.build_extensions()
    File "<string>", line 187, in build_extensions
    File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
      raise CalledProcessError(retcode, cmd)
  subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', '_moe_C', '-j', '8']' returned non-zero exit status 1.
  /tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py:988: _DebuggingTips: Problem in editable installation.
  !!

          ********************************************************************************
          An error happened while installing `vllm` in editable mode.

          The following steps are recommended to help debug this problem:

          - Try to install the project normally, without using the editable mode.
            Does the error still persist?
            (If it does, try fixing the problem before attempting the editable mode).
          - If you are using binary extensions, make sure you have all OS-level
            dependencies installed (e.g. compilers, toolchains, binary libraries, ...).
          - Try the latest version of setuptools (maybe the error was already fixed).
          - If you (or your project dependencies) are using any setuptools extension
            or customization, make sure they support the editable mode.

          After following the steps above, if the problem still persists and
          you think this is related to how setuptools handles editable installations,
          please submit a reproducible example
          (see https://stackoverflow.com/help/minimal-reproducible-example) to:

              https://github.com/pypa/setuptools/issues

          See https://setuptools.pypa.io/en/latest/userguide/development_mode.html for details.
          ********************************************************************************

  !!
    cmd_obj.run()
  Traceback (most recent call last):
    File "/home/basalam1676/.local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
      main()
    File "/home/basalam1676/.local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/home/basalam1676/.local/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 273, in build_editable
      return hook(wheel_directory, config_settings, metadata_directory)
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 443, in build_editable
      return self._build_with_temp_dir(
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 395, in _build_with_temp_dir
      self.run_setup()
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 311, in run_setup
      exec(code, locals())
    File "<string>", line 380, in <module>
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 104, in setup
      return distutils.core.setup(**attrs)
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 185, in setup
      return run_commands(dist)
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 201, in run_commands
      dist.run_commands()
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/dist.py", line 967, in run_command
      super().run_command(command)
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/command/editable_wheel.py", line 155, in run
      self._create_wheel_file(bdist_wheel)
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/command/editable_wheel.py", line 357, in _create_wheel_file
      files, mapping = self._run_build_commands(dist_name, unpacked, lib, tmp)
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/command/editable_wheel.py", line 280, in _run_build_commands
      self._run_build_subcommands()
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/command/editable_wheel.py", line 307, in _run_build_subcommands
      self.run_command(name)
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/dist.py", line 967, in run_command
      super().run_command(command)
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 91, in run
      _build_ext.run(self)
    File "/tmp/pip-build-env-5_3_h77o/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
      self.build_extensions()
    File "<string>", line 187, in build_extensions
    File "/usr/lib/python3.10/subprocess.py", line 369, in check_call
      raise CalledProcessError(retcode, cmd)
  subprocess.CalledProcessError: Command '['cmake', '--build', '.', '--target', '_moe_C', '-j', '8']' returned non-zero exit status 1.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building editable for vllm Failed to build vllm ERROR: Could not build wheels for vllm, which is required to install pyproject.toml-based projects `

jeejeelee commented 7 months ago

@MrzEsma I made the same modifications to bgmv_config.h as you did. Then I built the source using:

VLLM_INSTALL_PUNICA_KERNELS=1 python setup.py develop

I also encountered the build error,as follow

error: function "bgmv_kernel<feat_in,feat_out,in_T,out_T,W_T>(out_T *, const in_T *, const W_T *, const int64_t *, int64_t, int64_t, int64_t, int64_t, int64_t, float) [with feat_in=128, feat_out=128, in_T=nv_half, out_T=nv_bfloat16, W_T=nv_half]" explicitly instantiated more than once

I deleted two lines from bgmv_config.h then can build successed I hope that can be helpful.

qniguogym commented 3 months ago

In my personal understanding, you can quikly support rank=128 in the following way:

  1. clone source from this repo
  2. add FOR_BGMV_WIDE(f, in_T, out_T, W_T, 128) bgmv_config.h
  3. modify lora_config
  4. build from source using VLLM_INSTALL_PUNICA_KERNELS=1 pip install -e . Similarly, the same applies for rank=256.

If testing with rank=128 and rank=256 shows no issues, I feel like you can submit a PR to promote support for this feature.

have you solved it? I also meet this problem.

jeejeelee commented 3 months ago

In my personal understanding, you can quikly support rank=128 in the following way:

  1. clone source from this repo
  2. add FOR_BGMV_WIDE(f, in_T, out_T, W_T, 128) bgmv_config.h
  3. modify lora_config
  4. build from source using VLLM_INSTALL_PUNICA_KERNELS=1 pip install -e . Similarly, the same applies for rank=256.

If testing with rank=128 and rank=256 shows no issues, I feel like you can submit a PR to promote support for this feature.

have you solved it? I also meet this problem.

5036 can address this problem.

github-actions[bot] commented 2 weeks ago

This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you!