punica-ai / punica

Serving multiple LoRA finetuned LLM as one
https://arxiv.org/abs/2310.18547
Apache License 2.0
883 stars 40 forks source link

Error installing package #8

Closed jkl375 closed 7 months ago

jkl375 commented 7 months ago

When I install the package, the following error occurred image

abcdabcd987 commented 7 months ago

Can you provide the command that you run and the full output?

jkl375 commented 7 months ago

Can you provide the command that you run and the full output? At first, there is a ninja error. I use NVIDIA's Pytorch image nvcr.io/nvidia/Pytorch: 23.09-py3


root@ba66182d6ed2:/workspace# pip install -v --no-build-isolation .
Using pip 23.2.1 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Processing /workspace
Running command Preparing metadata (pyproject.toml)
running dist_info
creating /tmp/pip-modern-metadata-mcwxj_ha/punica.egg-info
writing /tmp/pip-modern-metadata-mcwxj_ha/punica.egg-info/PKG-INFO
writing dependency_links to /tmp/pip-modern-metadata-mcwxj_ha/punica.egg-info/dependency_links.txt
writing requirements to /tmp/pip-modern-metadata-mcwxj_ha/punica.egg-info/requires.txt
writing top-level names to /tmp/pip-modern-metadata-mcwxj_ha/punica.egg-info/top_level.txt
writing manifest file '/tmp/pip-modern-metadata-mcwxj_ha/punica.egg-info/SOURCES.txt'
reading manifest file '/tmp/pip-modern-metadata-mcwxj_ha/punica.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
no previously-included directories found matching 'benchmarks'
no previously-included directories found matching '*/__pycache__'
warning: no previously-included files matching '*.so' found anywhere in distribution
writing manifest file '/tmp/pip-modern-metadata-mcwxj_ha/punica.egg-info/SOURCES.txt'
creating '/tmp/pip-modern-metadata-mcwxj_ha/punica-0.0.1.dist-info'
Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from punica==0.0.1) (2.1.0a0+32f93b1)
Collecting transformers (from punica==0.0.1)
Obtaining dependency information for transformers from https://files.pythonhosted.org/packages/9a/06/e4ec2a321e57c03b7e9345d709d554a52c33760e5015fdff0919d9459af0/transformers-4.35.0-py3-none-any.whl.metadata
Downloading transformers-4.35.0-py3-none-any.whl.metadata (123 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 123.1/123.1 kB 669.4 kB/s eta 0:00:00
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from punica==0.0.1) (1.22.2)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->punica==0.0.1) (3.12.4)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch->punica==0.0.1) (4.7.1)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->punica==0.0.1) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->punica==0.0.1) (2.6.3)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->punica==0.0.1) (3.1.2)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->punica==0.0.1) (2023.6.0)
Collecting huggingface-hub<1.0,>=0.16.4 (from transformers->punica==0.0.1)
Obtaining dependency information for huggingface-hub<1.0,>=0.16.4 from https://files.pythonhosted.org/packages/65/cc/2891260847777eb9aaca278aaf3f846c9ff8ea1351643a4f33ff26d5d213/huggingface_hub-0.19.1-py3-none-any.whl.metadata
Downloading huggingface_hub-0.19.1-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers->punica==0.0.1) (23.1)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers->punica==0.0.1) (6.0.1)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers->punica==0.0.1) (2023.8.8)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers->punica==0.0.1) (2.31.0)
Collecting tokenizers<0.15,>=0.14 (from transformers->punica==0.0.1)
Obtaining dependency information for tokenizers<0.15,>=0.14 from https://files.pythonhosted.org/packages/a7/7b/c1f643eb086b6c5c33eef0c3752e37624bd23e4cbc9f1332748f1c6252d1/tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
Downloading tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting safetensors>=0.3.1 (from transformers->punica==0.0.1)
Obtaining dependency information for safetensors>=0.3.1 from https://files.pythonhosted.org/packages/20/4e/878b080dbda92666233ec6f316a53969edcb58eab1aa399a64d0521cf953/safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata
Downloading safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers->punica==0.0.1) (4.66.1)
Collecting huggingface-hub<1.0,>=0.16.4 (from transformers->punica==0.0.1)
Obtaining dependency information for huggingface-hub<1.0,>=0.16.4 from https://files.pythonhosted.org/packages/aa/f3/3fc97336a0e90516901befd4f500f08d691034d387406fdbde85bea827cc/huggingface_hub-0.17.3-py3-none-any.whl.metadata
Downloading huggingface_hub-0.17.3-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->punica==0.0.1) (2.1.3)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->punica==0.0.1) (3.2.0)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->punica==0.0.1) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->punica==0.0.1) (1.26.16)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->punica==0.0.1) (2023.7.22)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->punica==0.0.1) (1.3.0)
Downloading transformers-4.35.0-py3-none-any.whl (7.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.9/7.9 MB 16.4 MB/s eta 0:00:00
Downloading safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 151.6 MB/s eta 0:00:00
Downloading tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.8/3.8 MB 107.4 MB/s eta 0:00:00
Downloading huggingface_hub-0.17.3-py3-none-any.whl (295 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 295.0/295.0 kB 126.4 MB/s eta 0:00:00
Building wheels for collected packages: punica
Running command Building wheel for punica (pyproject.toml)
running bdist_wheel
running build
running build_py
running egg_info
writing punica.egg-info/PKG-INFO
writing dependency_links to punica.egg-info/dependency_links.txt
writing requirements to punica.egg-info/requires.txt
writing top-level names to punica.egg-info/top_level.txt
reading manifest file 'punica.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
no previously-included directories found matching 'benchmarks'
no previously-included directories found matching '*/__pycache__'
warning: no previously-included files matching '*.so' found anywhere in distribution
writing manifest file 'punica.egg-info/SOURCES.txt'
running build_ext
building 'punica.ops._kernels' extension
Emitting ninja build file /workspace/build/temp.linux-x86_64-3.10/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] /usr/local/cuda/bin/nvcc  -I/workspace/third_party/cutlass/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /workspace/csrc/sgmv_flashinfer/sgmv_all.cu -o /workspace/build/temp.linux-x86_64-3.10/csrc/sgmv_flashinfer/sgmv_all.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_72,code=sm_72 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_87,code=sm_87 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 -std=c++17
FAILED: /workspace/build/temp.linux-x86_64-3.10/csrc/sgmv_flashinfer/sgmv_all.o
/usr/local/cuda/bin/nvcc  -I/workspace/third_party/cutlass/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /workspace/csrc/sgmv_flashinfer/sgmv_all.cu -o /workspace/build/temp.linux-x86_64-3.10/csrc/sgmv_flashinfer/sgmv_all.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_72,code=sm_72 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_87,code=sm_87 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 -std=c++17
In file included from /usr/local/cuda/include/cuda/barrier:14,
from /usr/local/cuda/include/cuda/pipeline:56,
from /workspace/csrc/sgmv_flashinfer/permuted_smem.cuh:23,
from /workspace/csrc/sgmv_flashinfer/sgmv_flashinfer.cuh:6,
from /workspace/csrc/sgmv_flashinfer/sgmv_all.cu:8:
/usr/local/cuda/include/cuda/std/barrier:15:4: error: #error "CUDA synchronization primitives are only supported for sm_70 and up."
15 | #  error "CUDA synchronization primitives are only supported for sm_70 and up."
|    ^~~~~
In file included from /usr/local/cuda/include/cuda/std/detail/libcxx/include/atomic:727,
from /usr/local/cuda/include/cuda/std/detail/libcxx/include/barrier:60,
from /usr/local/cuda/include/cuda/std/barrier:22,
from /usr/local/cuda/include/cuda/barrier:14,
from /usr/local/cuda/include/cuda/pipeline:56,
from /workspace/csrc/sgmv_flashinfer/permuted_smem.cuh:23,
from /workspace/csrc/sgmv_flashinfer/sgmv_flashinfer.cuh:6,
from /workspace/csrc/sgmv_flashinfer/sgmv_all.cu:8:
/usr/local/cuda/include/cuda/std/detail/libcxx/include/support/atomic/atomic_cuda.h:12:4: error: #error "CUDA atomics are only supported for sm_60 and up on *nix and sm_70 and up on Windows."
12 | #  error "CUDA atomics are only supported for sm_60 and up on *nix and sm_70 and up on Windows."
|    ^~~~~
In file included from /usr/local/cuda/include/cuda/std/detail/libcxx/include/barrier:487,
from /usr/local/cuda/include/cuda/std/barrier:22,
from /usr/local/cuda/include/cuda/barrier:14,
from /usr/local/cuda/include/cuda/pipeline:56,
from /workspace/csrc/sgmv_flashinfer/permuted_smem.cuh:23,
from /workspace/csrc/sgmv_flashinfer/sgmv_flashinfer.cuh:6,
from /workspace/csrc/sgmv_flashinfer/sgmv_all.cu:8:
/usr/local/cuda/include/cuda/std/detail/libcxx/include/__cuda/barrier.h:19:4: error: #error "CUDA synchronization primitives are only supported for sm_70 and up."
19 | #  error "CUDA synchronization primitives are only supported for sm_70 and up."
|    ^~~~~
[2/3] /usr/local/cuda/bin/nvcc  -I/workspace/third_party/cutlass/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /workspace/csrc/bgmv/bgmv_all.cu -o /workspace/build/temp.linux-x86_64-3.10/csrc/bgmv/bgmv_all.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_72,code=sm_72 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_87,code=sm_87 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 -std=c++17
FAILED: /workspace/build/temp.linux-x86_64-3.10/csrc/bgmv/bgmv_all.o
/usr/local/cuda/bin/nvcc  -I/workspace/third_party/cutlass/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /workspace/csrc/bgmv/bgmv_all.cu -o /workspace/build/temp.linux-x86_64-3.10/csrc/bgmv/bgmv_all.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_72,code=sm_72 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_87,code=sm_87 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 -std=c++17
In file included from /usr/local/cuda/include/cuda/barrier:14,
from /usr/local/cuda/include/cuda/pipeline:56,
from /workspace/csrc/bgmv/bgmv_impl.cuh:6,
from /workspace/csrc/bgmv/bgmv_all.cu:2:
/usr/local/cuda/include/cuda/std/barrier:15:4: error: #error "CUDA synchronization primitives are only supported for sm_70 and up."
15 | #  error "CUDA synchronization primitives are only supported for sm_70 and up."
|    ^~~~~
In file included from /usr/local/cuda/include/cuda/std/detail/libcxx/include/atomic:727,
from /usr/local/cuda/include/cuda/std/detail/libcxx/include/barrier:60,
from /usr/local/cuda/include/cuda/std/barrier:22,
from /usr/local/cuda/include/cuda/barrier:14,
from /usr/local/cuda/include/cuda/pipeline:56,
from /workspace/csrc/bgmv/bgmv_impl.cuh:6,
from /workspace/csrc/bgmv/bgmv_all.cu:2:
/usr/local/cuda/include/cuda/std/detail/libcxx/include/support/atomic/atomic_cuda.h:12:4: error: #error "CUDA atomics are only supported for sm_60 and up on *nix and sm_70 and up on Windows."
12 | #  error "CUDA atomics are only supported for sm_60 and up on *nix and sm_70 and up on Windows."
|    ^~~~~
In file included from /usr/local/cuda/include/cuda/std/detail/libcxx/include/barrier:487,
from /usr/local/cuda/include/cuda/std/barrier:22,
from /usr/local/cuda/include/cuda/barrier:14,
from /usr/local/cuda/include/cuda/pipeline:56,
from /workspace/csrc/bgmv/bgmv_impl.cuh:6,
from /workspace/csrc/bgmv/bgmv_all.cu:2:
/usr/local/cuda/include/cuda/std/detail/libcxx/include/__cuda/barrier.h:19:4: error: #error "CUDA synchronization primitives are only supported for sm_70 and up."
19 | #  error "CUDA synchronization primitives are only supported for sm_70 and up."
|    ^~~~~
[3/3] /usr/local/cuda/bin/nvcc  -I/workspace/third_party/cutlass/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /workspace/csrc/flashinfer_adapter/flashinfer_all.cu -o /workspace/build/temp.linux-x86_64-3.10/csrc/flashinfer_adapter/flashinfer_all.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_72,code=sm_72 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_87,code=sm_87 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 -std=c++17
FAILED: /workspace/build/temp.linux-x86_64-3.10/csrc/flashinfer_adapter/flashinfer_all.o
/usr/local/cuda/bin/nvcc  -I/workspace/third_party/cutlass/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /workspace/csrc/flashinfer_adapter/flashinfer_all.cu -o /workspace/build/temp.linux-x86_64-3.10/csrc/flashinfer_adapter/flashinfer_all.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_72,code=sm_72 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_87,code=sm_87 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 -std=c++17
In file included from /usr/local/cuda/include/cuda/barrier:14,
from /usr/local/cuda/include/cuda/pipeline:56,
from /workspace/csrc/flashinfer_adapter/../flashinfer/decode.cuh:24,
from /workspace/csrc/flashinfer_adapter/flashinfer_all.cu:5:
/usr/local/cuda/include/cuda/std/barrier:15:4: error: #error "CUDA synchronization primitives are only supported for sm_70 and up."
15 | #  error "CUDA synchronization primitives are only supported for sm_70 and up."
|    ^~~~~
In file included from /usr/local/cuda/include/cuda/std/detail/libcxx/include/atomic:727,
from /usr/local/cuda/include/cuda/std/detail/libcxx/include/barrier:60,
from /usr/local/cuda/include/cuda/std/barrier:22,
from /usr/local/cuda/include/cuda/barrier:14,
from /usr/local/cuda/include/cuda/pipeline:56,
from /workspace/csrc/flashinfer_adapter/../flashinfer/decode.cuh:24,
from /workspace/csrc/flashinfer_adapter/flashinfer_all.cu:5:
/usr/local/cuda/include/cuda/std/detail/libcxx/include/support/atomic/atomic_cuda.h:12:4: error: #error "CUDA atomics are only supported for sm_60 and up on *nix and sm_70 and up on Windows."
12 | #  error "CUDA atomics are only supported for sm_60 and up on *nix and sm_70 and up on Windows."
|    ^~~~~
In file included from /usr/local/cuda/include/cuda/std/detail/libcxx/include/barrier:487,
from /usr/local/cuda/include/cuda/std/barrier:22,
from /usr/local/cuda/include/cuda/barrier:14,
from /usr/local/cuda/include/cuda/pipeline:56,
from /workspace/csrc/flashinfer_adapter/../flashinfer/decode.cuh:24,
from /workspace/csrc/flashinfer_adapter/flashinfer_all.cu:5:
/usr/local/cuda/include/cuda/std/detail/libcxx/include/__cuda/barrier.h:19:4: error: #error "CUDA synchronization primitives are only supported for sm_70 and up."
19 | #  error "CUDA synchronization primitives are only supported for sm_70 and up."
|    ^~~~~
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1917, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.10/subprocess.py", line 526, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in main() File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main json_out['return_val'] = hook(hook_input['kwargs']) File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel return _build_backend().build_wheel(wheel_directory, config_settings, File "/usr/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 434, in build_wheel return self._build_with_temp_dir( File "/usr/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 419, in _build_with_temp_dir self.run_setup() File "/usr/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 341, in run_setup exec(code, locals()) File "", line 51, in File "/usr/local/lib/python3.10/dist-packages/setuptools/init.py", line 103, in setup return distutils.core.setup(attrs) File "/usr/lib/python3.10/distutils/core.py", line 148, in setup dist.run_commands() File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands self.run_command(cmd) File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 989, in run_command super().run_command(command) File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/usr/local/lib/python3.10/dist-packages/wheel/bdist_wheel.py", line 364, in run self.run_command("build") File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 989, in run_command super().run_command(command) File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/usr/lib/python3.10/distutils/command/build.py", line 135, in run self.run_command(cmd_name) File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 989, in run_command super().run_command(command) File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 88, in run _build_ext.run(self) File "/usr/lib/python3.10/distutils/command/build_ext.py", line 340, in run self.build_extensions() File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 865, in build_extensions build_ext.build_extensions(self) File "/usr/lib/python3.10/distutils/command/build_ext.py", line 449, in build_extensions self._build_extensions_serial() File "/usr/lib/python3.10/distutils/command/build_ext.py", line 474, in _build_extensions_serial self.build_extension(ext) File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 249, in build_extension _build_ext.build_extension(self, ext) File "/usr/local/lib/python3.10/dist-packages/Cython/Distutils/build_ext.py", line 127, in build_extension super(build_ext, self).build_extension(ext) File "/usr/lib/python3.10/distutils/command/build_ext.py", line 529, in build_extension objects = self.compiler.compile(sources, File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 678, in unix_wrap_ninja_compile _write_ninja_file_and_compile_objects( File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1590, in _write_ninja_file_and_compile_objects _run_ninja_build( File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1933, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error compiling objects for extension error: subprocess-exited-with-error

× Building wheel for punica (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip. full command: /usr/bin/python /usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmp1q3xdwn0 cwd: /workspace Building wheel for punica (pyproject.toml) ... error ERROR: Failed building wheel for punica Failed to build punica ERROR: Could not build wheels for punica, which is required to install pyproject.toml-based projects

[notice] A new release of pip is available: 23.2.1 -> 23.3.1 [notice] To update, run: python -m pip install --upgrade pip root@ba66182d6ed2:/workspace# pip install -v --no-build-isolation . Using pip 23.2.1 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10) Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Processing /workspace Running command Preparing metadata (pyproject.toml) running dist_info creating /tmp/pip-modern-metadata-vhp_fir1/punica.egg-info writing /tmp/pip-modern-metadata-vhp_fir1/punica.egg-info/PKG-INFO writing dependency_links to /tmp/pip-modern-metadata-vhp_fir1/punica.egg-info/dependency_links.txt writing requirements to /tmp/pip-modern-metadata-vhp_fir1/punica.egg-info/requires.txt writing top-level names to /tmp/pip-modern-metadata-vhp_fir1/punica.egg-info/top_level.txt writing manifest file '/tmp/pip-modern-metadata-vhp_fir1/punica.egg-info/SOURCES.txt' reading manifest file '/tmp/pip-modern-metadata-vhp_fir1/punica.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' no previously-included directories found matching 'benchmarks' no previously-included directories found matching '/pycache' warning: no previously-included files matching '.so' found anywhere in distribution writing manifest file '/tmp/pip-modern-metadata-vhp_fir1/punica.egg-info/SOURCES.txt' creating '/tmp/pip-modern-metadata-vhp_fir1/punica-0.0.1.dist-info' Preparing metadata (pyproject.toml) ... done Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from punica==0.0.1) (2.1.0a0+32f93b1) Collecting transformers (from punica==0.0.1) Obtaining dependency information for transformers from https://files.pythonhosted.org/packages/9a/06/e4ec2a321e57c03b7e9345d709d554a52c33760e5015fdff0919d9459af0/transformers-4.35.0-py3-none-any.whl.metadata Downloading transformers-4.35.0-py3-none-any.whl.metadata (123 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 123.1/123.1 kB 314.5 kB/s eta 0:00:00 Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from punica==0.0.1) (1.22.2) Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->punica==0.0.1) (3.12.4) Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch->punica==0.0.1) (4.7.1) Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->punica==0.0.1) (1.12) Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->punica==0.0.1) (2.6.3) Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->punica==0.0.1) (3.1.2) Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->punica==0.0.1) (2023.6.0) Collecting huggingface-hub<1.0,>=0.16.4 (from transformers->punica==0.0.1) Obtaining dependency information for huggingface-hub<1.0,>=0.16.4 from https://files.pythonhosted.org/packages/65/cc/2891260847777eb9aaca278aaf3f846c9ff8ea1351643a4f33ff26d5d213/huggingface_hub-0.19.1-py3-none-any.whl.metadata Downloading huggingface_hub-0.19.1-py3-none-any.whl.metadata (13 kB) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers->punica==0.0.1) (23.1) Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers->punica==0.0.1) (6.0.1) Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers->punica==0.0.1) (2023.8.8) Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers->punica==0.0.1) (2.31.0) Collecting tokenizers<0.15,>=0.14 (from transformers->punica==0.0.1) Obtaining dependency information for tokenizers<0.15,>=0.14 from https://files.pythonhosted.org/packages/a7/7b/c1f643eb086b6c5c33eef0c3752e37624bd23e4cbc9f1332748f1c6252d1/tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata Downloading tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB) Collecting safetensors>=0.3.1 (from transformers->punica==0.0.1) Obtaining dependency information for safetensors>=0.3.1 from https://files.pythonhosted.org/packages/20/4e/878b080dbda92666233ec6f316a53969edcb58eab1aa399a64d0521cf953/safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata Downloading safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.8 kB) Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers->punica==0.0.1) (4.66.1) Collecting huggingface-hub<1.0,>=0.16.4 (from transformers->punica==0.0.1) Obtaining dependency information for huggingface-hub<1.0,>=0.16.4 from https://files.pythonhosted.org/packages/aa/f3/3fc97336a0e90516901befd4f500f08d691034d387406fdbde85bea827cc/huggingface_hub-0.17.3-py3-none-any.whl.metadata Downloading huggingface_hub-0.17.3-py3-none-any.whl.metadata (13 kB) Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->punica==0.0.1) (2.1.3) Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->punica==0.0.1) (3.2.0) Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->punica==0.0.1) (3.4) Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->punica==0.0.1) (1.26.16) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->punica==0.0.1) (2023.7.22) Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->punica==0.0.1) (1.3.0) Downloading transformers-4.35.0-py3-none-any.whl (7.9 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.9/7.9 MB 7.5 MB/s eta 0:00:00 Downloading safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.3/1.3 MB 95.0 MB/s eta 0:00:00 Downloading tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.8/3.8 MB 93.6 MB/s eta 0:00:00 Downloading huggingface_hub-0.17.3-py3-none-any.whl (295 kB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 295.0/295.0 kB 146.0 MB/s eta 0:00:00 Building wheels for collected packages: punica Running command Building wheel for punica (pyproject.toml) running bdist_wheel running build running build_py running egg_info writing punica.egg-info/PKG-INFO writing dependency_links to punica.egg-info/dependency_links.txt writing requirements to punica.egg-info/requires.txt writing top-level names to punica.egg-info/top_level.txt reading manifest file 'punica.egg-info/SOURCES.txt' reading manifest template 'MANIFEST.in' no previously-included directories found matching 'benchmarks' no previously-included directories found matching '/pycache' warning: no previously-included files matching '.so' found anywhere in distribution writing manifest file 'punica.egg-info/SOURCES.txt' running build_ext building 'punica.ops._kernels' extension Emitting ninja build file /workspace/build/temp.linux-x86_64-3.10/build.ninja... Compiling objects... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/3] /usr/local/cuda/bin/nvcc -I/workspace/third_party/cutlass/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /workspace/csrc/sgmv_flashinfer/sgmv_all.cu -o /workspace/build/temp.linux-x86_64-3.10/csrc/sgmv_flashinfer/sgmv_all.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_72,code=sm_72 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_87,code=sm_87 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 -std=c++17 FAILED: /workspace/build/temp.linux-x86_64-3.10/csrc/sgmv_flashinfer/sgmv_all.o /usr/local/cuda/bin/nvcc -I/workspace/third_party/cutlass/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /workspace/csrc/sgmv_flashinfer/sgmv_all.cu -o /workspace/build/temp.linux-x86_64-3.10/csrc/sgmv_flashinfer/sgmv_all.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_72,code=sm_72 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_87,code=sm_87 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 -std=c++17 In file included from /usr/local/cuda/include/cuda/barrier:14, from /usr/local/cuda/include/cuda/pipeline:56, from /workspace/csrc/sgmv_flashinfer/permuted_smem.cuh:23, from /workspace/csrc/sgmv_flashinfer/sgmv_flashinfer.cuh:6, from /workspace/csrc/sgmv_flashinfer/sgmv_all.cu:8: /usr/local/cuda/include/cuda/std/barrier:15:4: error: #error "CUDA synchronization primitives are only supported for sm_70 and up." 15 | # error "CUDA synchronization primitives are only supported for sm_70 and up." | ^~~~~ In file included from /usr/local/cuda/include/cuda/std/detail/libcxx/include/atomic:727, from /usr/local/cuda/include/cuda/std/detail/libcxx/include/barrier:60, from /usr/local/cuda/include/cuda/std/barrier:22, from /usr/local/cuda/include/cuda/barrier:14, from /usr/local/cuda/include/cuda/pipeline:56, from /workspace/csrc/sgmv_flashinfer/permuted_smem.cuh:23, from /workspace/csrc/sgmv_flashinfer/sgmv_flashinfer.cuh:6, from /workspace/csrc/sgmv_flashinfer/sgmv_all.cu:8: /usr/local/cuda/include/cuda/std/detail/libcxx/include/support/atomic/atomic_cuda.h:12:4: error: #error "CUDA atomics are only supported for sm_60 and up on nix and sm_70 and up on Windows." 12 | # error "CUDA atomics are only supported for sm_60 and up on nix and sm_70 and up on Windows." | ^~~~~ In file included from /usr/local/cuda/include/cuda/std/detail/libcxx/include/barrier:487, from /usr/local/cuda/include/cuda/std/barrier:22, from /usr/local/cuda/include/cuda/barrier:14, from /usr/local/cuda/include/cuda/pipeline:56, from /workspace/csrc/sgmv_flashinfer/permuted_smem.cuh:23, from /workspace/csrc/sgmv_flashinfer/sgmv_flashinfer.cuh:6, from /workspace/csrc/sgmv_flashinfer/sgmv_all.cu:8: /usr/local/cuda/include/cuda/std/detail/libcxx/include/cuda/barrier.h:19:4: error: #error "CUDA synchronization primitives are only supported for sm_70 and up." 19 | # error "CUDA synchronization primitives are only supported for sm_70 and up." | ^~~~~ [2/3] /usr/local/cuda/bin/nvcc -I/workspace/third_party/cutlass/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /workspace/csrc/bgmv/bgmv_all.cu -o /workspace/build/temp.linux-x86_64-3.10/csrc/bgmv/bgmv_all.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_72,code=sm_72 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_87,code=sm_87 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 -std=c++17 FAILED: /workspace/build/temp.linux-x86_64-3.10/csrc/bgmv/bgmv_all.o /usr/local/cuda/bin/nvcc -I/workspace/third_party/cutlass/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /workspace/csrc/bgmv/bgmv_all.cu -o /workspace/build/temp.linux-x86_64-3.10/csrc/bgmv/bgmv_all.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_72,code=sm_72 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_87,code=sm_87 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 -std=c++17 In file included from /usr/local/cuda/include/cuda/barrier:14, from /usr/local/cuda/include/cuda/pipeline:56, from /workspace/csrc/bgmv/bgmv_impl.cuh:6, from /workspace/csrc/bgmv/bgmv_all.cu:2: /usr/local/cuda/include/cuda/std/barrier:15:4: error: #error "CUDA synchronization primitives are only supported for sm_70 and up." 15 | # error "CUDA synchronization primitives are only supported for sm_70 and up." | ^~~~~ In file included from /usr/local/cuda/include/cuda/std/detail/libcxx/include/atomic:727, from /usr/local/cuda/include/cuda/std/detail/libcxx/include/barrier:60, from /usr/local/cuda/include/cuda/std/barrier:22, from /usr/local/cuda/include/cuda/barrier:14, from /usr/local/cuda/include/cuda/pipeline:56, from /workspace/csrc/bgmv/bgmv_impl.cuh:6, from /workspace/csrc/bgmv/bgmv_all.cu:2: /usr/local/cuda/include/cuda/std/detail/libcxx/include/support/atomic/atomic_cuda.h:12:4: error: #error "CUDA atomics are only supported for sm_60 and up on nix and sm_70 and up on Windows." 12 | # error "CUDA atomics are only supported for sm_60 and up on nix and sm_70 and up on Windows." | ^~~~~ In file included from /usr/local/cuda/include/cuda/std/detail/libcxx/include/barrier:487, from /usr/local/cuda/include/cuda/std/barrier:22, from /usr/local/cuda/include/cuda/barrier:14, from /usr/local/cuda/include/cuda/pipeline:56, from /workspace/csrc/bgmv/bgmv_impl.cuh:6, from /workspace/csrc/bgmv/bgmv_all.cu:2: /usr/local/cuda/include/cuda/std/detail/libcxx/include/__cuda/barrier.h:19:4: error: #error "CUDA synchronization primitives are only supported for sm_70 and up." 19 | # error "CUDA synchronization primitives are only supported for sm_70 and up." | ^~~~~ [3/3] /usr/local/cuda/bin/nvcc -I/workspace/third_party/cutlass/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /workspace/csrc/flashinfer_adapter/flashinfer_all.cu -o /workspace/build/temp.linux-x86_64-3.10/csrc/flashinfer_adapter/flashinfer_all.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_72,code=sm_72 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_87,code=sm_87 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 -std=c++17 FAILED: /workspace/build/temp.linux-x86_64-3.10/csrc/flashinfer_adapter/flashinfer_all.o /usr/local/cuda/bin/nvcc -I/workspace/third_party/cutlass/include -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -I/usr/include/python3.10 -c -c /workspace/csrc/flashinfer_adapter/flashinfer_all.cu -o /workspace/build/temp.linux-x86_64-3.10/csrc/flashinfer_adapter/flashinfer_all.o --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=_kernels -D_GLIBCXX_USE_CXX11_ABI=1 -gencode=arch=compute_52,code=sm_52 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_61,code=sm_61 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_72,code=sm_72 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_86,code=sm_86 -gencode=arch=compute_87,code=sm_87 -gencode=arch=compute_90,code=compute_90 -gencode=arch=compute_90,code=sm_90 -std=c++17 In file included from /usr/local/cuda/include/cuda/barrier:14, from /usr/local/cuda/include/cuda/pipeline:56, from /workspace/csrc/flashinfer_adapter/../flashinfer/decode.cuh:24, from /workspace/csrc/flashinfer_adapter/flashinfer_all.cu:5: /usr/local/cuda/include/cuda/std/barrier:15:4: error: #error "CUDA synchronization primitives are only supported for sm_70 and up." 15 | # error "CUDA synchronization primitives are only supported for sm_70 and up." | ^~~~~ In file included from /usr/local/cuda/include/cuda/std/detail/libcxx/include/atomic:727, from /usr/local/cuda/include/cuda/std/detail/libcxx/include/barrier:60, from /usr/local/cuda/include/cuda/std/barrier:22, from /usr/local/cuda/include/cuda/barrier:14, from /usr/local/cuda/include/cuda/pipeline:56, from /workspace/csrc/flashinfer_adapter/../flashinfer/decode.cuh:24, from /workspace/csrc/flashinfer_adapter/flashinfer_all.cu:5: /usr/local/cuda/include/cuda/std/detail/libcxx/include/support/atomic/atomic_cuda.h:12:4: error: #error "CUDA atomics are only supported for sm_60 and up on nix and sm_70 and up on Windows." 12 | # error "CUDA atomics are only supported for sm_60 and up on nix and sm_70 and up on Windows." | ^~~~~ In file included from /usr/local/cuda/include/cuda/std/detail/libcxx/include/barrier:487, from /usr/local/cuda/include/cuda/std/barrier:22, from /usr/local/cuda/include/cuda/barrier:14, from /usr/local/cuda/include/cuda/pipeline:56, from /workspace/csrc/flashinfer_adapter/../flashinfer/decode.cuh:24, from /workspace/csrc/flashinfer_adapter/flashinfer_all.cu:5: /usr/local/cuda/include/cuda/std/detail/libcxx/include/cuda/barrier.h:19:4: error: #error "CUDA synchronization primitives are only supported for sm_70 and up." 19 | # error "CUDA synchronization primitives are only supported for sm_70 and up." | ^~~~~ ninja: build stopped: subcommand failed. Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1917, in _run_ninja_build subprocess.run( File "/usr/lib/python3.10/subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in main() File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main json_out['return_val'] = hook(hook_input['kwargs']) File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel return _build_backend().build_wheel(wheel_directory, config_settings, File "/usr/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 434, in build_wheel return self._build_with_temp_dir( File "/usr/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 419, in _build_with_temp_dir self.run_setup() File "/usr/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 341, in run_setup exec(code, locals()) File "", line 51, in File "/usr/local/lib/python3.10/dist-packages/setuptools/init.py", line 103, in setup return distutils.core.setup(attrs) File "/usr/lib/python3.10/distutils/core.py", line 148, in setup dist.run_commands() File "/usr/lib/python3.10/distutils/dist.py", line 966, in run_commands self.run_command(cmd) File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 989, in run_command super().run_command(command) File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/usr/local/lib/python3.10/dist-packages/wheel/bdist_wheel.py", line 364, in run self.run_command("build") File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 989, in run_command super().run_command(command) File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/usr/lib/python3.10/distutils/command/build.py", line 135, in run self.run_command(cmd_name) File "/usr/lib/python3.10/distutils/cmd.py", line 313, in run_command self.distribution.run_command(command) File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 989, in run_command super().run_command(command) File "/usr/lib/python3.10/distutils/dist.py", line 985, in run_command cmd_obj.run() File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 88, in run _build_ext.run(self) File "/usr/lib/python3.10/distutils/command/build_ext.py", line 340, in run self.build_extensions() File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 865, in build_extensions build_ext.build_extensions(self) File "/usr/lib/python3.10/distutils/command/build_ext.py", line 449, in build_extensions self._build_extensions_serial() File "/usr/lib/python3.10/distutils/command/build_ext.py", line 474, in _build_extensions_serial self.build_extension(ext) File "/usr/local/lib/python3.10/dist-packages/setuptools/command/build_ext.py", line 249, in build_extension _build_ext.build_extension(self, ext) File "/usr/local/lib/python3.10/dist-packages/Cython/Distutils/build_ext.py", line 127, in build_extension super(build_ext, self).build_extension(ext) File "/usr/lib/python3.10/distutils/command/build_ext.py", line 529, in build_extension objects = self.compiler.compile(sources, File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 678, in unix_wrap_ninja_compile _write_ninja_file_and_compile_objects( File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1590, in _write_ninja_file_and_compile_objects _run_ninja_build( File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1933, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error compiling objects for extension error: subprocess-exited-with-error

× Building wheel for punica (pyproject.toml) did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip. full command: /usr/bin/python /usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py build_wheel /tmp/tmpr95z1doq cwd: /workspace Building wheel for punica (pyproject.toml) ... error ERROR: Failed building wheel for punica Failed to build punica ERROR: Could not build wheels for punica, which is required to install pyproject.toml-based projects


Later, I change command=['ninja ',' - v '] to command=['ninja', '-- version'] in pytorch's utils/cpp_ Extension.py. But it  still does not work.
yzh119 commented 7 months ago

Can you provide information about your GPU? Seems the CUDA architecture is too old (< sm_70).

abcdabcd987 commented 7 months ago

From your log:

[1/3] /usr/local/cuda/bin/nvcc  \
-I/workspace/third_party/cutlass/include \
-I/usr/local/lib/python3.10/dist-packages/torch/include \
-I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include \
-I/usr/local/lib/python3.10/dist-packages/torch/include/TH \
-I/usr/local/lib/python3.10/dist-packages/torch/include/THC \
-I/usr/local/cuda/include \
-I/usr/include/python3.10 \
-c \
-c /workspace/csrc/sgmv_flashinfer/sgmv_all.cu \
-o /workspace/build/temp.linux-x86_64-3.10/csrc/sgmv_flashinfer/sgmv_all.o \
--expt-relaxed-constexpr \
--compiler-options ''"'"'-fPIC'"'"'' \
-DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' \
-DTORCH_EXTENSION_NAME=_kernels \
-D_GLIBCXX_USE_CXX11_ABI=1 \
-gencode=arch=compute_52,code=sm_52 \
-gencode=arch=compute_60,code=sm_60 \
-gencode=arch=compute_61,code=sm_61 \
-gencode=arch=compute_70,code=sm_70 \
-gencode=arch=compute_72,code=sm_72 \
-gencode=arch=compute_75,code=sm_75 \
-gencode=arch=compute_80,code=sm_80 \
-gencode=arch=compute_86,code=sm_86 \
-gencode=arch=compute_87,code=sm_87 \
-gencode=arch=compute_90,code=compute_90 \
-gencode=arch=compute_90,code=sm_90 \
-std=c++17

Looks like that your container has set some settings that by default compiles for every architecture. Can you try to override with:

env TORCH_CUDA_ARCH_LIST="8.0" pip install -v --no-build-isolation .

Later, I change command=['ninja ',' - v '] to command=['ninja', '-- version'] in pytorch's utils/cpp_ Extension.py. But it still does not work.

Please revert.

jkl375 commented 7 months ago

From your log:

[1/3] /usr/local/cuda/bin/nvcc  \
-I/workspace/third_party/cutlass/include \
-I/usr/local/lib/python3.10/dist-packages/torch/include \
-I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include \
-I/usr/local/lib/python3.10/dist-packages/torch/include/TH \
-I/usr/local/lib/python3.10/dist-packages/torch/include/THC \
-I/usr/local/cuda/include \
-I/usr/include/python3.10 \
-c \
-c /workspace/csrc/sgmv_flashinfer/sgmv_all.cu \
-o /workspace/build/temp.linux-x86_64-3.10/csrc/sgmv_flashinfer/sgmv_all.o \
--expt-relaxed-constexpr \
--compiler-options ''"'"'-fPIC'"'"'' \
-DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' \
-DTORCH_EXTENSION_NAME=_kernels \
-D_GLIBCXX_USE_CXX11_ABI=1 \
-gencode=arch=compute_52,code=sm_52 \
-gencode=arch=compute_60,code=sm_60 \
-gencode=arch=compute_61,code=sm_61 \
-gencode=arch=compute_70,code=sm_70 \
-gencode=arch=compute_72,code=sm_72 \
-gencode=arch=compute_75,code=sm_75 \
-gencode=arch=compute_80,code=sm_80 \
-gencode=arch=compute_86,code=sm_86 \
-gencode=arch=compute_87,code=sm_87 \
-gencode=arch=compute_90,code=compute_90 \
-gencode=arch=compute_90,code=sm_90 \
-std=c++17

Looks like that your container has set some settings that by default compiles for every architecture. Can you try to override with:

env TORCH_CUDA_ARCH_LIST="8.0" pip install -v --no-build-isolation .

Later, I change command=['ninja ',' - v '] to command=['ninja', '-- version'] in pytorch's utils/cpp_ Extension.py. But it still does not work.

Please revert.

It works! Thank you very much. image

jkl375 commented 7 months ago

Can you provide information about your GPU? Seems the CUDA architecture is too old (< sm_70).

My GPU is A100, and it seems that the env TORCH_CUDA_ARCH_LIST is not set correctly in the container. The problem is resolved now. Thank you very much!

abcdabcd987 commented 7 months ago

Glad that it works!