punica-ai / punica

Serving multiple LoRA finetuned LLM as one
https://arxiv.org/abs/2310.18547
Apache License 2.0
883 stars 40 forks source link

ModuleNotFoundError: No module named 'punica.ops._kernels' #12

Closed bibekyess closed 7 months ago

bibekyess commented 7 months ago

Hello, I am trying to run punica in cuda-toolkit-11.8 but I get this error ModuleNotFoundError: No module named 'punica.ops._kernels', when running: python -m benchmarks.bench_textgen_lora --system punica --batch-size 32.

The build seems successful except one warning:

/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/utils/cpp_extension.py:424: UserWarning: There are no g++ version bounds defined for CUDA version 11.8
    warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')

The detailed log is this when running env TORCH_CUDA_ARCH_LIST="8.0" pip install -v --no-build-isolation: (I tried running inside the docker container and also outside. In both cases, I get the ModuleNotFoundError.)

Building wheels for collected packages: punica
  Running command Building wheel for punica (pyproject.toml)
  No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
  /root/miniconda3/envs/punica/lib/python3.10/site-packages/setuptools/config/pyprojecttoml.py:66: _BetaConfiguration: Support for `[tool.setuptools]` in `pyproject.toml` is still *beta*.
    config = read_configuration(filepath, True, ignore_option_errors, dist)
  running bdist_wheel
  running build
  running build_py
  running egg_info
  writing punica.egg-info/PKG-INFO
  writing dependency_links to punica.egg-info/dependency_links.txt
  writing requirements to punica.egg-info/requires.txt
  writing top-level names to punica.egg-info/top_level.txt
  reading manifest file 'punica.egg-info/SOURCES.txt'
  reading manifest template 'MANIFEST.in'
  no previously-included directories found matching 'benchmarks'
  no previously-included directories found matching '*/__pycache__'
  warning: no previously-included files matching '*.so' found anywhere in distribution
  adding license file 'LICENSE'
  writing manifest file 'punica.egg-info/SOURCES.txt'
  running build_ext
  /root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/utils/cpp_extension.py:424: UserWarning: There are no g++ version bounds defined for CUDA version 11.8
    warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
  building 'punica.ops._kernels' extension
  Emitting ninja build file /punica/build/temp.linux-x86_64-cpython-310/build.ninja...
  Compiling objects...
  Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
  [1/6] /usr/local/cuda/bin/nvcc  -I/punica/third_party/cutlass/include -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include/TH -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/root/miniconda3/envs/punica/include/python3.10 -c -c /punica/csrc/rms_norm/rms_norm_cutlass.cu -o /punica/build/temp.linux-x86_64-cpython-310/csrc/rms_norm/rms_norm_cutlass.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=sm_80 -std=c++17
  [2/6] /usr/local/cuda/bin/nvcc  -I/punica/third_party/cutlass/include -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include/TH -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/root/miniconda3/envs/punica/include/python3.10 -c -c /punica/csrc/sgmv_flashinfer/sgmv_all.cu -o /punica/build/temp.linux-x86_64-cpython-310/csrc/sgmv_flashinfer/sgmv_all.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=sm_80 -std=c++17
  [3/6] /usr/local/cuda/bin/nvcc  -I/punica/third_party/cutlass/include -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include/TH -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/root/miniconda3/envs/punica/include/python3.10 -c -c /punica/csrc/sgmv/sgmv_cutlass.cu -o /punica/build/temp.linux-x86_64-cpython-310/csrc/sgmv/sgmv_cutlass.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=sm_80 -std=c++17
  [4/6] /usr/local/cuda/bin/nvcc  -I/punica/third_party/cutlass/include -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include/TH -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/root/miniconda3/envs/punica/include/python3.10 -c -c /punica/csrc/flashinfer_adapter/flashinfer_all.cu -o /punica/build/temp.linux-x86_64-cpython-310/csrc/flashinfer_adapter/flashinfer_all.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=sm_80 -std=c++17
  [5/6] c++ -MMD -MF /punica/build/temp.linux-x86_64-cpython-310/csrc/punica_ops.o.d -pthread -B /root/miniconda3/envs/punica/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /root/miniconda3/envs/punica/include -fPIC -O2 -isystem /root/miniconda3/envs/punica/include -fPIC -I/punica/third_party/cutlass/include -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include/TH -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/root/miniconda3/envs/punica/include/python3.10 -c -c /punica/csrc/punica_ops.cc -o /punica/build/temp.linux-x86_64-cpython-310/csrc/punica_ops.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17
  [6/6] /usr/local/cuda/bin/nvcc  -I/punica/third_party/cutlass/include -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include/TH -I/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/include/THC -I/usr/local/cuda/include -I/root/miniconda3/envs/punica/include/python3.10 -c -c /punica/csrc/bgmv/bgmv_all.cu -o /punica/build/temp.linux-x86_64-cpython-310/csrc/bgmv/bgmv_all.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_80,code=sm_80 -std=c++17
  g++ -pthread -B /root/miniconda3/envs/punica/compiler_compat -shared -Wl,-rpath,/root/miniconda3/envs/punica/lib -Wl,-rpath-link,/root/miniconda3/envs/punica/lib -L/root/miniconda3/envs/punica/lib -Wl,-rpath,/root/miniconda3/envs/punica/lib -Wl,-rpath-link,/root/miniconda3/envs/punica/lib -L/root/miniconda3/envs/punica/lib /punica/build/temp.linux-x86_64-cpython-310/csrc/bgmv/bgmv_all.o /punica/build/temp.linux-x86_64-cpython-310/csrc/flashinfer_adapter/flashinfer_all.o /punica/build/temp.linux-x86_64-cpython-310/csrc/punica_ops.o /punica/build/temp.linux-x86_64-cpython-310/csrc/rms_norm/rms_norm_cutlass.o /punica/build/temp.linux-x86_64-cpython-310/csrc/sgmv/sgmv_cutlass.o /punica/build/temp.linux-x86_64-cpython-310/csrc/sgmv_flashinfer/sgmv_all.o -L/root/miniconda3/envs/punica/lib/python3.10/site-packages/torch/lib -L/usr/local/cuda/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda -o build/lib.linux-x86_64-cpython-310/punica/ops/_kernels.cpython-310-x86_64-linux-gnu.so
  installing to build/bdist.linux-x86_64/wheel
  running install
  running install_lib
  creating build/bdist.linux-x86_64
  creating build/bdist.linux-x86_64/wheel
  creating build/bdist.linux-x86_64/wheel/punica
  creating build/bdist.linux-x86_64/wheel/punica/ops
  copying build/lib.linux-x86_64-cpython-310/punica/ops/_kernels.cpython-310-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel/punica/ops
  copying build/lib.linux-x86_64-cpython-310/punica/ops/__init__.py -> build/bdist.linux-x86_64/wheel/punica/ops
  copying build/lib.linux-x86_64-cpython-310/punica/__init__.py -> build/bdist.linux-x86_64/wheel/punica
  creating build/bdist.linux-x86_64/wheel/punica/models
  copying build/lib.linux-x86_64-cpython-310/punica/models/llama.py -> build/bdist.linux-x86_64/wheel/punica/models
  copying build/lib.linux-x86_64-cpython-310/punica/models/llama_lora.py -> build/bdist.linux-x86_64/wheel/punica/models
  copying build/lib.linux-x86_64-cpython-310/punica/models/__init__.py -> build/bdist.linux-x86_64/wheel/punica/models
  creating build/bdist.linux-x86_64/wheel/punica/utils
  copying build/lib.linux-x86_64-cpython-310/punica/utils/cat_tensor.py -> build/bdist.linux-x86_64/wheel/punica/utils
  copying build/lib.linux-x86_64-cpython-310/punica/utils/convert_lora_weight.py -> build/bdist.linux-x86_64/wheel/punica/utils
  copying build/lib.linux-x86_64-cpython-310/punica/utils/__init__.py -> build/bdist.linux-x86_64/wheel/punica/utils
  copying build/lib.linux-x86_64-cpython-310/punica/utils/kvcache.py -> build/bdist.linux-x86_64/wheel/punica/utils
  copying build/lib.linux-x86_64-cpython-310/punica/utils/lora.py -> build/bdist.linux-x86_64/wheel/punica/utils
  running install_egg_info
  Copying punica.egg-info to build/bdist.linux-x86_64/wheel/punica-0.0.1-py3.10.egg-info
  running install_scripts
  creating build/bdist.linux-x86_64/wheel/punica-0.0.1.dist-info/WHEEL
  creating '/tmp/pip-wheel-xt0fqdps/.tmp-jps4usf1/punica-0.0.1-cp310-cp310-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it
  adding 'punica/__init__.py'
  adding 'punica/models/__init__.py'
  adding 'punica/models/llama.py'
  adding 'punica/models/llama_lora.py'
  adding 'punica/ops/__init__.py'
  adding 'punica/ops/_kernels.cpython-310-x86_64-linux-gnu.so'
  adding 'punica/utils/__init__.py'
  adding 'punica/utils/cat_tensor.py'
  adding 'punica/utils/convert_lora_weight.py'
  adding 'punica/utils/kvcache.py'
  adding 'punica/utils/lora.py'
  adding 'punica-0.0.1.dist-info/LICENSE'
  adding 'punica-0.0.1.dist-info/METADATA'
  adding 'punica-0.0.1.dist-info/WHEEL'
  adding 'punica-0.0.1.dist-info/top_level.txt'
  adding 'punica-0.0.1.dist-info/RECORD'
  removing build/bdist.linux-x86_64/wheel
  Building wheel for punica (pyproject.toml) ... done
  Created wheel for punica: filename=punica-0.0.1-cp310-cp310-linux_x86_64.whl size=799747 sha256=f423816025988aa50102a5792e5dff1debdd9d6910ea4b74364ce1610a216684
  Stored in directory: /tmp/pip-ephem-wheel-cache-fkt205mc/wheels/0e/58/4b/992f075cedd202c2dc89c9ac8a7146ab9ff7495bc4741422bf
Successfully built punica
Installing collected packages: tqdm, safetensors, regex, packaging, fsspec, huggingface-hub, tokenizers, transformers, punica
  changing mode of /root/miniconda3/envs/punica/bin/tqdm to 755
  changing mode of /root/miniconda3/envs/punica/bin/huggingface-cli to 755
  changing mode of /root/miniconda3/envs/punica/bin/transformers-cli to 755
Successfully installed fsspec-2023.10.0 huggingface-hub-0.19.4 packaging-23.2 punica-0.0.1 regex-2023.10.3 safetensors-0.4.0 tokenizers-0.15.0 tqdm-4.66.1 transformers-4.35.2

Can you inform the suggested cudatookit-version for building? Thank you!

luciferlinx101 commented 7 months ago

I am also facing similar issue Traceback (most recent call last): File "/home/ubuntu/miniconda3/envs/muti-tenant-test-1/lib/python3.8/runpy.py", line 185, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "/home/ubuntu/miniconda3/envs/muti-tenant-test-1/lib/python3.8/runpy.py", line 111, in _get_module_details import(pkg_name) File "/home/ubuntu/multi-tenant-test/punica/punica/init.py", line 1, in import punica.models File "/home/ubuntu/multi-tenant-test/punica/punica/models/init.py", line 1, in import punica.models.llama File "/home/ubuntu/multi-tenant-test/punica/punica/models/llama.py", line 16, in from punica.ops import append_kv, init_kv, batch_decode, rms_norm File "/home/ubuntu/multi-tenant-test/punica/punica/ops/init.py", line 3, in import punica.ops._kernels as _kernels ModuleNotFoundError: No module named 'punica.ops._kernels'

abcdabcd987 commented 7 months ago

We follow PyTorch's default environment, which is CUDA 12.1 now. Other CUDA versions should also work, but I haven't tested. I'll add CI tests once I got time.

But I think @bibekyess you have successfully built the package. Not sure why this happens... Can you tell me which container image you are using?

bibekyess commented 7 months ago

Ahh Ok. I tried with two images: nvidia/cuda:11.8.0-devel-ubuntu20.04(above log was with this) and nvidia/cuda:12.0.0-devel-ubuntu20.04. In both, the build was successfull but I got the error.

luciferlinx101 commented 7 months ago

@abcdabcd987 I get import punica.ops._kernels as _kernels ModuleNotFoundError: No module named 'punica.ops._kernels'

when running python -m punica.utils.convert_lora_weight model/gsm8k-r16/adapter_model.bin model/gsm8k-r16.punica.pt

bibekyess commented 7 months ago

@abcdabcd987 Additional information: I tried with nvidia/cuda:12.1.1-devel-ubuntu20.04 image and also used switch-cuda to try on my local machine but still I am getting the same error. I am using A100-SXM4-40GB GPU. Thank you :)

abcdabcd987 commented 7 months ago

Ha I see what is going on here. You are running in the project root directory, which has a punica folder. In this case, import punica will import from this folder. I'll improve the project directory structure to avoid this issue.

For now, @bibekyess @luciferlinx101 can you try with editable installation (-e)?

env TORCH_CUDA_ARCH_LIST="8.0" pip install -v -e  .
bibekyess commented 7 months ago

@abcdabcd987 Thank you! It gets solved. :)