ninja: build stopped in convert_weight

xuboming8 commented 3 years ago

Traceback (most recent call last): File "/home/10301003/anaconda3/envs/pytorch1.2/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1030, in _build_extension_module check=True) File "/home/10301003/anaconda3/envs/pytorch1.2/lib/python3.7/subprocess.py", line 512, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "convert_weight.py", line 11, in from model import Generator, Discriminator File "/home/10301007/stylegan2-pytorch-master/model.py", line 11, in from op import FusedLeakyReLU, fused_leaky_relu, upfirdn2d, conv2d_gradfix File "/home/10301007/stylegan2-pytorch-master/op/init.py", line 1, in from .fused_act import FusedLeakyReLU, fused_leaky_relu File "/home/10301007/stylegan2-pytorch-master/op/fused_act.py", line 15, in os.path.join(module_path, "fused_bias_act_kernel.cu"), File "/home/10301003/anaconda3/envs/pytorch1.2/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 661, in load is_python_module) File "/home/10301003/anaconda3/envs/pytorch1.2/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 830, in _jit_compile with_cuda=with_cuda) File "/home/10301003/anaconda3/envs/pytorch1.2/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 883, in _write_ninja_file_and_build _build_extension_module(name, build_directory, verbose) File "/home/10301003/anaconda3/envs/pytorch1.2/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1043, in _build_extension_module raise RuntimeError(message) RuntimeError: Error building extension 'fused': [1/3] /cm/shared/apps/cuda10.2/toolkit/10.2.89/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/10301003/anaconda3/envs/pytorch1.2/lib/python3.7/site-packages/torch/include -isystem /home/10301003/anaconda3/envs/pytorch1.2/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/10301003/anaconda3/envs/pytorch1.2/lib/python3.7/site-packages/torch/include/TH -isystem /home/10301003/anaconda3/envs/pytorch1.2/lib/python3.7/site-packages/torch/include/THC -isystem /cm/shared/apps/cuda10.2/toolkit/10.2.89/include -isystem /home/10301003/anaconda3/envs/pytorch1.2/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=1 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /home/10301007/stylegan2-pytorch-master/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o FAILED: fused_bias_act_kernel.cuda.o /cm/shared/apps/cuda10.2/toolkit/10.2.89/bin/nvcc -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/10301003/anaconda3/envs/pytorch1.2/lib/python3.7/site-packages/torch/include -isystem /home/10301003/anaconda3/envs/pytorch1.2/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/10301003/anaconda3/envs/pytorch1.2/lib/python3.7/site-packages/torch/include/TH -isystem /home/10301003/anaconda3/envs/pytorch1.2/lib/python3.7/site-packages/torch/include/THC -isystem /cm/shared/apps/cuda10.2/toolkit/10.2.89/include -isystem /home/10301003/anaconda3/envs/pytorch1.2/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=1 -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_75,code=sm_75 --compiler-options '-fPIC' -std=c++11 -c /home/10301007/stylegan2-pytorch-master/op/fused_bias_act_kernel.cu -o fused_bias_act_kernel.cuda.o In file included from /cm/shared/apps/cuda10.2/toolkit/10.2.89/include/cuda_runtime.h:83, from : /cm/shared/apps/cuda10.2/toolkit/10.2.89/include/crt/host_config.h:138:2: error: #error -- unsupported GNU version! gcc versions later than 8 are not supported! 138 | #error -- unsupported GNU version! gcc versions later than 8 are not supported! | ^~~~~ [2/3] c++ -MMD -MF fused_bias_act.o.d -DTORCH_EXTENSION_NAME=fused -DTORCH_API_INCLUDE_EXTENSION_H -isystem /home/10301003/anaconda3/envs/pytorch1.2/lib/python3.7/site-packages/torch/include -isystem /home/10301003/anaconda3/envs/pytorch1.2/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -isystem /home/10301003/anaconda3/envs/pytorch1.2/lib/python3.7/site-packages/torch/include/TH -isystem /home/10301003/anaconda3/envs/pytorch1.2/lib/python3.7/site-packages/torch/include/THC -isystem /cm/shared/apps/cuda10.2/toolkit/10.2.89/include -isystem /home/10301003/anaconda3/envs/pytorch1.2/include/python3.7m -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++11 -c /home/10301007/stylegan2-pytorch-master/op/fused_bias_act.cpp -o fused_bias_act.o ninja: build stopped: subcommand failed.

I use cuda10.2，pytorch1.3.1，ninja1.10.0. How can I solve this issue?

rosinality commented 3 years ago

You may have to use older version of gcc. Or, simply it may be better to use official cuda-dev docker images.

denabazazian commented 3 years ago

I have the same problem. I've tried to run the convert_weight.py several times with pytorch 1.3, pytorch 1.4, pytorch 1.7, and also gcc 5.2, and gcc 4.8, gcc 7.3, but every time I got the same error as this issue. I used cuda 10.1 in all my experiments.

rosinality commented 3 years ago

@denabazazian Could you try official docker images like nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04 + build-essential & pytorch & ninja? It will work without additional configurations.

Nerdyvedi commented 3 years ago

@rosinality I tried to use the same Dockerfile, But for some reason facing issues with running tensorflow-gpu. Would it be possible for you to provide the Dockerfile you used? Thanks

rosinality commented 3 years ago

@Nerdyvedi For tensorflow problems, it would be easier to use cuda 10 images.

FROM nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04

ARG APT_INSTALL="apt-get install -y --no-install-recommends"
ARG PIP_INSTALL="python -m pip --no-cache-dir install --upgrade"
ARG GIT_CLONE="git clone --depth 10"

ENV HOME /root

WORKDIR $HOME

RUN rm -rf /var/lib/apt/lists/* \
           /etc/apt/sources.list.d/cuda.list \
           /etc/apt/sources.list.d/nvidia-ml.list

RUN apt-get update

ARG DEBIAN_FRONTEND=noninteractive

RUN $APT_INSTALL build-essential software-properties-common ca-certificates \
                 wget git zlib1g-dev nasm cmake

RUN add-apt-repository ppa:deadsnakes/ppa

RUN apt-get update

RUN $APT_INSTALL python3.7 python3.7-dev

RUN wget -O $HOME/get-pip.py https://bootstrap.pypa.io/get-pip.py

RUN python3.7 $HOME/get-pip.py

RUN ln -s /usr/bin/python3.7 /usr/local/bin/python3
RUN ln -s /usr/bin/python3.7 /usr/local/bin/python

RUN $PIP_INSTALL setuptools
RUN $PIP_INSTALL numpy scipy nltk lmdb cython pydantic pyhocon

RUN $PIP_INSTALL torch==1.7.1+cu92 torchvision==0.8.2+cu92 -f https://download.pytorch.org/whl/torch_stable.html

ENV FORCE_CUDA="1"
ENV TORCH_CUDA_ARCH_LIST="Pascal;Volta;Turing"

RUN $APT_INSTALL libsm6 libxext6 libxrender1
RUN $PIP_INSTALL opencv-python-headless

RUN python -m pip uninstall -y pillow pil jpeg libtiff libjpeg-turbo

RUN $GIT_CLONE https://github.com/libjpeg-turbo/libjpeg-turbo.git
WORKDIR libjpeg-turbo
RUN mkdir build
WORKDIR build
RUN cmake -G"Unix Makefiles" -DCMAKE_INSTALL_PREFIX=libjpeg-turbo -DWITH_JPEG8=1 ..
RUN make
RUN make install
WORKDIR libjpeg-turbo
RUN mv include/jerror.h include/jmorecfg.h include/jpeglib.h include/turbojpeg.h /usr/include
RUN mv include/jconfig.h /usr/include/x86_64-linux-gnu
RUN mv lib/*.* /usr/lib/x86_64-linux-gnu
RUN mv lib/pkgconfig/* /usr/lib/x86_64-linux-gnu/pkgconfig
RUN ldconfig

RUN CFLAGS="${CFLAGS} -mavx2" $PIP_INSTALL --force-reinstall --no-binary :all: --compile pillow-simd

WORKDIR $HOME

RUN ldconfig
RUN apt-get clean
RUN apt-get autoremove
RUN rm -rf /var/lib/apt/lists/* /tmp/* ~/*

rosinality / stylegan2-pytorch

ninja: build stopped in convert_weight #215