Open mohbay opened 4 months ago
Hi @mohbay - can you share your ds_report? My guess is you don't have the deepspeed-kernels/cutlass kernels installed for those ops to build.
Hi @loadams This might be related to cutlass indeed. Thanks a lot. Below is the ds_report
DeepSpeed general environment info: torch install path ............... ['deepspeed/dsenv/lib/python3.10/site-packages/torch'] torch version .................... 2.3.1+cu121 deepspeed install path ........... ['deepspeed/dsenv/lib/python3.10/site-packages/deepspeed'] deepspeed info ................... 0.14.4, unknown, unknown torch cuda version ............... 12.1 torch hip version ................ None nvcc version ..................... 12.3 deepspeed wheel compiled w. ...... torch 2.3, cuda 12.1 shared memory (/dev/shm) size .... 125.80 GB
Can you share your pip list
as well? Or have you installed deepspeed-kernels
?
deepspeed-kernels is installed. Here is the pip list.
Package Version
aniso8601 9.0.1 annotated-types 0.7.0 asyncio 3.4.3 blinker 1.8.2 certifi 2024.7.4 charset-normalizer 3.3.2 click 8.1.7 cmake 3.30.0 deepspeed 0.14.4 deepspeed-kernels 0.0.1.dev1698255861 deepspeed-mii 0.2.3 filelock 3.15.4 Flask 3.0.3 Flask-RESTful 0.3.10 fsspec 2024.6.1 grpcio 1.64.1 grpcio-tools 1.64.1 hjson 3.1.0 huggingface-hub 0.23.5 idna 3.7 itsdangerous 2.2.0 Jinja2 3.1.4 MarkupSafe 2.1.5 mpmath 1.3.0 networkx 3.3 ninja 1.11.1.1 numpy 1.26.4 nvidia-cublas-cu12 12.1.3.1 nvidia-cuda-cupti-cu12 12.1.105 nvidia-cuda-nvrtc-cu12 12.1.105 nvidia-cuda-runtime-cu12 12.1.105 nvidia-cudnn-cu12 8.9.2.26 nvidia-cufft-cu12 11.0.2.54 nvidia-curand-cu12 10.3.2.106 nvidia-cusolver-cu12 11.4.5.107 nvidia-cusparse-cu12 12.1.0.106 nvidia-ml-py 12.555.43 nvidia-nccl-cu12 2.20.5 nvidia-nvjitlink-cu12 12.5.82 nvidia-nvtx-cu12 12.1.105 packaging 24.1 pillow 10.4.0 pip 22.0.2 protobuf 5.27.2 psutil 6.0.0 py-cpuinfo 9.0.0 pydantic 2.8.2 pydantic_core 2.20.1 pynvml 11.5.2 pytz 2024.1 PyYAML 6.0.1 pyzmq 26.0.3 regex 2024.5.15 requests 2.32.3 safetensors 0.4.3 setuptools 59.6.0 six 1.16.0 sympy 1.13.0 tokenizers 0.19.1 torch 2.3.1 tqdm 4.66.4 transformers 4.41.2 triton 2.3.1 typing_extensions 4.12.2 ujson 5.10.0 urllib3 2.2.2 Werkzeug 3.0.3 zmq 0.0.0
I cloned the cutlass repository and updated the CUTLASS_PATH so the [WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH is now gone But I still get RuntimeError: Error building extension 'ragged_device_ops'
The error starts with this: Building extension module ragged_device_ops... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/2] c++ -MMD -MF blocked_flash.o.d -DTORCH_EXTENSION_NAME=ragged_device_ops -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/deepspeed/dsenv/lib/python3.10/site-packages/deepspeed/inference/v2/kernels/includes -I/deepspeed/dsenv/lib/python3.10/site-packages/deepspeed/inference/v2/kernels/ragged_ops -I/deepspeed/dsenv/lib/python3.10/site-packages/deepspeed/inference/v2/kernels/ragged_ops/atom_builder -I/deepspeed/dsenv/lib/python3.10/site-packages/deepspeed/inference/v2/kernels/ragged_ops/blocked_flash -I/deepspeed/dsenv/lib/python3.10/site-packages/deepspeed/inference/v2/kernels/ragged_ops/embed -I/deepspeed/dsenv/lib/python3.10/site-packages/deepspeed/inference/v2/kernels/ragged_ops/includes -I/deepspeed/dsenv/lib/python3.10/site-packages/deepspeed/inference/v2/kernels/ragged_ops/linear_blocked_kv_rotary -I/deepspeed/dsenv/lib/python3.10/site-packages/deepspeed/inference/v2/kernels/ragged_ops/logits_gather -I/deepspeed/dsenv/lib/python3.10/site-packages/deepspeed/inference/v2/kernels/ragged_ops/moe_gather -I/deepspeed/dsenv/lib/python3.10/site-packages/deepspeed/inference/v2/kernels/ragged_ops/moe_scatter -I/deepspeed/dsenv/lib/python3.10/site-packages/deepspeed/inference/v2/kernels/ragged_ops/ragged_helpers -I/deepspeed/dsenv/lib/python3.10/site-packages/deepspeed/inference/v2/kernels/ragged_ops/top_k_gating -isystem /deepspeed/dsenv/lib/python3.10/site-packages/torch/include -isystem /deepspeed/dsenv/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -isystem /deepspeed/dsenv/lib/python3.10/site-packages/torch/include/TH -isystem /deepspeed/dsenv/lib/python3.10/site-packages/torch/include/THC -isystem /Linux_x86_64/24.1/compilers/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DBF16_AVAILABLE -c /deepspeed/dsenv/lib/python3.10/site-packages/deepspeed/inference/v2/kernels/ragged_ops/blocked_flash/blocked_flash.cpp -o blocked_flash.o FAILED: blocked_flash.o
Starting from the code pipe = mii.pipeline("mistralai/Mistral-7B-v0.1")
It does not work (on A100 python 3.10 and cuda12.1
ImportError: torch_extensions/py310_cu121/ragged_device_ops/ragged_device_ops.so: cannot open shared object file: No such file or directory