在调用下面的代码后
from onediff.infer_compiler.transform import transform_mgr transformed_diffusers = transform_mgr.transform_package("diffusers")
会出现这样的编译过程,需要额外占用将近一分钟的时间。这导致如果在服务器较为频繁扩容缩容的情况下,会导致比较慢的启动时间。
Could not load the custom kernel for multi-scale deformable attention: No module named 'MultiScaleDeformableAttention' Could not load the custom kernel for multi-scale deformable attention: No module named 'MultiScaleDeformableAttention' Failed to load CUDA kernels. Mra requires custom CUDA kernels. Please verify that compatible versions of PyTorch and CUDA Toolkit are installed: No module named 'cuda_kernel' Failed to load CUDA kernels. Mra requires custom CUDA kernels. Please verify that compatible versions of PyTorch and CUDA Toolkit are installed: No module named 'cuda_kernel' You are using torch==0.9.1+cu121.git.1c6623a, but torch>=1.12.0 is required to use TapasModel. Please upgrade torch. You are using torch==0.9.1+cu121.git.1c6623a, but torch>=1.12.0 is required to use TapasModel. Please upgrade torch. Loading custom CUDA kernels... Loading custom CUDA kernels... Loading custom CUDA kernels... Loading custom CUDA kernels... Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu121/cuda_kernel/build.ninja... Building extension module cuda_kernel... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/4] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=cuda_kernel -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /usr/local/lib/python3.10/dist-packages/transformers/kernels/mra/cuda_kernel.cu -o cuda_kernel.cuda.o [2/4] c++ -MMD -MF torch_extension.o.d -DTORCH_EXTENSION_NAME=cuda_kernel -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -c /usr/local/lib/python3.10/dist-packages/transformers/kernels/mra/torch_extension.cpp -o torch_extension.o [3/4] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=cuda_kernel -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /usr/local/lib/python3.10/dist-packages/transformers/kernels/mra/cuda_launch.cu -o cuda_launch.cuda.o [4/4] c++ cuda_kernel.cuda.o cuda_launch.cuda.o torch_extension.o -shared -L/usr/local/lib/python3.10/dist-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o cuda_kernel.so Loading extension module cuda_kernel...
经过定位后发现可能是通过使用oneflow劫持torch,导致transformers包里面误认为torch版本较低,从而触发额外的编译cuda_kernel的行为
from onediff.infer_compiler.transform import transform_mgr transformed_diffusers = transform_mgr.transform_package("diffusers") a = transformed_diffusers.ModelMixin()
Describe the bug
在调用下面的代码后
from onediff.infer_compiler.transform import transform_mgr transformed_diffusers = transform_mgr.transform_package("diffusers")
会出现这样的编译过程,需要额外占用将近一分钟的时间。这导致如果在服务器较为频繁扩容缩容的情况下,会导致比较慢的启动时间。Could not load the custom kernel for multi-scale deformable attention: No module named 'MultiScaleDeformableAttention' Could not load the custom kernel for multi-scale deformable attention: No module named 'MultiScaleDeformableAttention' Failed to load CUDA kernels. Mra requires custom CUDA kernels. Please verify that compatible versions of PyTorch and CUDA Toolkit are installed: No module named 'cuda_kernel' Failed to load CUDA kernels. Mra requires custom CUDA kernels. Please verify that compatible versions of PyTorch and CUDA Toolkit are installed: No module named 'cuda_kernel' You are using torch==0.9.1+cu121.git.1c6623a, but torch>=1.12.0 is required to use TapasModel. Please upgrade torch. You are using torch==0.9.1+cu121.git.1c6623a, but torch>=1.12.0 is required to use TapasModel. Please upgrade torch. Loading custom CUDA kernels... Loading custom CUDA kernels... Loading custom CUDA kernels... Loading custom CUDA kernels... Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu121/cuda_kernel/build.ninja... Building extension module cuda_kernel... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/4] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=cuda_kernel -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /usr/local/lib/python3.10/dist-packages/transformers/kernels/mra/cuda_kernel.cu -o cuda_kernel.cuda.o [2/4] c++ -MMD -MF torch_extension.o.d -DTORCH_EXTENSION_NAME=cuda_kernel -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -c /usr/local/lib/python3.10/dist-packages/transformers/kernels/mra/torch_extension.cpp -o torch_extension.o [3/4] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=cuda_kernel -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /usr/local/lib/python3.10/dist-packages/transformers/kernels/mra/cuda_launch.cu -o cuda_launch.cuda.o [4/4] c++ cuda_kernel.cuda.o cuda_launch.cuda.o torch_extension.o -shared -L/usr/local/lib/python3.10/dist-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o cuda_kernel.so Loading extension module cuda_kernel...
经过定位后发现可能是通过使用oneflow劫持torch,导致transformers包里面误认为torch版本较低,从而触发额外的编译cuda_kernel的行为Your environment
transformers==4.37.2 oneflow==0.9.1+cu121 onediff==1.0.0
OneDiff git commit id
OneFlow version info if you have installed oneflow
version: 0.9.1+cu121.git.1c6623a git_commit: 1c6623a cmake_build_type: Release rdma: False mlir: True enterprise: True
How To Reproduce
from onediff.infer_compiler.transform import transform_mgr transformed_diffusers = transform_mgr.transform_package("diffusers") a = transformed_diffusers.ModelMixin()