siliconflow / onediff

OneDiff: An out-of-the-box acceleration library for diffusion models.
https://github.com/siliconflow/onediff/wiki
Apache License 2.0
1.61k stars 99 forks source link

onediff通过劫持torch导致的额外编译时间 #985

Closed CuddleSabe closed 3 months ago

CuddleSabe commented 3 months ago

Describe the bug

在调用下面的代码后 from onediff.infer_compiler.transform import transform_mgr transformed_diffusers = transform_mgr.transform_package("diffusers") 会出现这样的编译过程,需要额外占用将近一分钟的时间。这导致如果在服务器较为频繁扩容缩容的情况下,会导致比较慢的启动时间。 Could not load the custom kernel for multi-scale deformable attention: No module named 'MultiScaleDeformableAttention' Could not load the custom kernel for multi-scale deformable attention: No module named 'MultiScaleDeformableAttention' Failed to load CUDA kernels. Mra requires custom CUDA kernels. Please verify that compatible versions of PyTorch and CUDA Toolkit are installed: No module named 'cuda_kernel' Failed to load CUDA kernels. Mra requires custom CUDA kernels. Please verify that compatible versions of PyTorch and CUDA Toolkit are installed: No module named 'cuda_kernel' You are using torch==0.9.1+cu121.git.1c6623a, but torch>=1.12.0 is required to use TapasModel. Please upgrade torch. You are using torch==0.9.1+cu121.git.1c6623a, but torch>=1.12.0 is required to use TapasModel. Please upgrade torch. Loading custom CUDA kernels... Loading custom CUDA kernels... Loading custom CUDA kernels... Loading custom CUDA kernels... Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root... Detected CUDA files, patching ldflags Emitting ninja build file /root/.cache/torch_extensions/py310_cu121/cuda_kernel/build.ninja... Building extension module cuda_kernel... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/4] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=cuda_kernel -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /usr/local/lib/python3.10/dist-packages/transformers/kernels/mra/cuda_kernel.cu -o cuda_kernel.cuda.o [2/4] c++ -MMD -MF torch_extension.o.d -DTORCH_EXTENSION_NAME=cuda_kernel -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -c /usr/local/lib/python3.10/dist-packages/transformers/kernels/mra/torch_extension.cpp -o torch_extension.o [3/4] /usr/local/cuda/bin/nvcc -DTORCH_EXTENSION_NAME=cuda_kernel -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_89,code=compute_89 -gencode=arch=compute_89,code=sm_89 --compiler-options '-fPIC' -std=c++17 -c /usr/local/lib/python3.10/dist-packages/transformers/kernels/mra/cuda_launch.cu -o cuda_launch.cuda.o [4/4] c++ cuda_kernel.cuda.o cuda_launch.cuda.o torch_extension.o -shared -L/usr/local/lib/python3.10/dist-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o cuda_kernel.so Loading extension module cuda_kernel... 经过定位后发现可能是通过使用oneflow劫持torch,导致transformers包里面误认为torch版本较低,从而触发额外的编译cuda_kernel的行为

Your environment

transformers==4.37.2 oneflow==0.9.1+cu121 onediff==1.0.0

OneDiff git commit id

OneFlow version info if you have installed oneflow

version: 0.9.1+cu121.git.1c6623a git_commit: 1c6623a cmake_build_type: Release rdma: False mlir: True enterprise: True

How To Reproduce

from onediff.infer_compiler.transform import transform_mgr transformed_diffusers = transform_mgr.transform_package("diffusers") a = transformed_diffusers.ModelMixin()

CuddleSabe commented 3 months ago

最后通过升级transformers版本到4.40.1解决,在这一版本中他们将编译过程由import挪动到init里面,避免了没有使用的模块也要被编译的情况