laekov / fastmoe

A fast MoE impl for PyTorch
https://fastmoe.ai
Apache License 2.0
1.56k stars 188 forks source link

setup.py error! #182

Closed R-QinQ closed 10 months ago

R-QinQ commented 10 months ago

Error information: ` running install running bdist_egg running egg_info writing fastmoe.egg-info/PKG-INFO writing dependency_links to fastmoe.egg-info/dependency_links.txt writing top-level names to fastmoe.egg-info/top_level.txt reading manifest file 'fastmoe.egg-info/SOURCES.txt' adding license file 'LICENSE' writing manifest file 'fastmoe.egg-info/SOURCES.txt' installing library code to build/bdist.linux-x86_64/egg running install_lib running build_py running build_ext building 'fmoe_cuda' extension Emitting ninja build file /data/rqq/fastmoe-1.1.0/build/temp.linux-x86_64-3.7/build.ninja... Compiling objects... Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N) [1/3] /usr/local/cuda/bin/nvcc -I/data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include -I/data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/TH -I/data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/data/rqq/envs/pytorch/include/python3.7m -c -c /data/rqq/fastmoe-1.1.0/cuda/parallel_linear.cu -o /data/rqq/fastmoe-1.1.0/build/temp.linux-x86_64-3.7/cuda/parallel_linear.o -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DFMOE_USE_NCCL -DUSE_C10D_NCCL -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fmoe_cuda -DTORCH_EXTENSION_NAME=fmoe_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 FAILED: /data/rqq/fastmoe-1.1.0/build/temp.linux-x86_64-3.7/cuda/parallel_linear.o /usr/local/cuda/bin/nvcc -I/data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include -I/data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/TH -I/data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/data/rqq/envs/pytorch/include/python3.7m -c -c /data/rqq/fastmoe-1.1.0/cuda/parallel_linear.cu -o /data/rqq/fastmoe-1.1.0/build/temp.linux-x86_64-3.7/cuda/parallel_linear.o -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DFMOE_USE_NCCL -DUSE_C10D_NCCL -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fmoe_cuda -DTORCH_EXTENSION_NAME=fmoe_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 /data/rqq/fastmoe-1.1.0/cuda/utils/cublas_wrapper.h(129): error: identifier "CUDA_R_16BF" is undefined

1 error detected in the compilation of "/tmp/tmpxft_0000202d_00000000-6_parallel_linear.cpp1.ii". [2/3] /usr/local/cuda/bin/nvcc -I/data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include -I/data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/TH -I/data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/data/rqq/envs/pytorch/include/python3.7m -c -c /data/rqq/fastmoe-1.1.0/cuda/balancing.cu -o /data/rqq/fastmoe-1.1.0/build/temp.linux-x86_64-3.7/cuda/balancing.o -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DFMOE_USE_NCCL -DUSE_C10D_NCCL -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fmoe_cuda -DTORCH_EXTENSION_NAME=fmoe_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 FAILED: /data/rqq/fastmoe-1.1.0/build/temp.linux-x86_64-3.7/cuda/balancing.o /usr/local/cuda/bin/nvcc -I/data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include -I/data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/TH -I/data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/data/rqq/envs/pytorch/include/python3.7m -c -c /data/rqq/fastmoe-1.1.0/cuda/balancing.cu -o /data/rqq/fastmoe-1.1.0/build/temp.linux-x86_64-3.7/cuda/balancing.o -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DFMOE_USE_NCCL -DUSE_C10D_NCCL -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fmoe_cuda -DTORCH_EXTENSION_NAME=fmoe_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 /data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptr torch::nn::Cloneable::clone(const c10::optional&) const [with Derived = torch::nn::CrossMapLRN2dImpl]’: /tmp/tmpxft_0000202c_00000000-5_balancing.cudafe1.stub.c:4:27: required from here /data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string, at::Tensor>&’`

The environment is :

laekov commented 10 months ago

This seems to be a compatibility issue of bf16 on older cuda. If you are not using bf16, you can do an ad-hoc fix by removing this function

R-QinQ commented 10 months ago

这似乎是旧版 cuda 上 bf16 的兼容性问题。如果您不使用 bf16,您可以通过删除此函数来进行临时修复

but still have another probem:

FAILED: /data/rqq/fastmoe-1.1.0/build/temp.linux-x86_64-3.7/cuda/balancing.o /usr/local/cuda/bin/nvcc -I/data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include -I/data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/TH -I/data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/data/rqq/envs/pytorch/include/python3.7m -c -c /data/rqq/fastmoe-1.1.0/cuda/balancing.cu -o /data/rqq/fastmoe-1.1.0/build/temp.linux-x86_64-3.7/cuda/balancing.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=fmoe_cuda -DTORCH_EXTENSION_NAME=fmoe_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 /data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptr<torch::nn::Module> torch::nn::Cloneable<Derived>::clone(const c10::optional<c10::Device>&) const [with Derived = torch::nn::CrossMapLRN2dImpl]’: /tmp/tmpxft_0000016b_00000000-5_balancing.cudafe1.stub.c:4:27: required from here /data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string<char>, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string<char>, at::Tensor>&’ /data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string<char>, std::shared_ptr<torch::nn::Module> >’ to type ‘torch::OrderedDict<std::basic_string<char>, std::shared_ptr<torch::nn::Module> >&’ /data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h: In instantiation of ‘std::shared_ptr<torch::nn::Module> torch::nn::Cloneable<Derived>::clone(const c10::optional<c10::Device>&) const [with Derived = torch::nn::EmbeddingBagImpl]’: /tmp/tmpxft_0000016b_00000000-5_balancing.cudafe1.stub.c:4:27: required from here /data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:57:59: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string<char>, at::Tensor>’ to type ‘torch::OrderedDict<std::basic_string<char>, at::Tensor>&’ /data/rqq/envs/pytorch/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/nn/cloneable.h:69:61: error: invalid static_cast from type ‘const torch::OrderedDict<std::basic_string<char>, std::shared_ptr<torch::nn::Module> >’ to type ‘torch::OrderedDict<std::basic_string<char>, std::shared_ptr<torch::nn::Module> >&’

laekov commented 10 months ago

This issue seems tricky.

As it is hard to find such an ancient cuda (and pytorch), I tried to reproduce this error by using docker image pytorch/pytorch:1.10.0-cuda11.3-cudnn8-devel or pytorch/pytorch:1.9.0-cuda10.2-cudnn7-devel, but unfortuntely, in neither image could I reproduce the latter static_cast problem.

I guess you need a newer gcc (both images are based on ubuntu 18.04 and gcc 7.5.0).

R-QinQ commented 10 months ago

This issue seems tricky.

As it is hard to find such an ancient cuda (and pytorch), I tried to reproduce this error by using docker image pytorch/pytorch:1.10.0-cuda11.3-cudnn8-devel or pytorch/pytorch:1.9.0-cuda10.2-cudnn7-devel, but unfortuntely, in neither image could I reproduce the latter static_cast problem.

I guess you need a newer gcc (both images are based on ubuntu 18.04 and gcc 7.5.0).

you are right!! thank you