theFoxofSky / ddfnet

The official implementation of the CVPR2021 paper: Decoupled Dynamic Filter Networks
MIT License
214 stars 34 forks source link

core dumped #13

Closed sujyQ closed 3 years ago

sujyQ commented 3 years ago

Hi. There is an error when i run grad_check.py

cudaCheckError() failed : invalid device function Segmentation fault (core dumped)

theFoxofSky commented 3 years ago

Could you please share your environment?

sujyQ commented 3 years ago

cudatoolkit=10.1.243 ddf=1.0 ipdp=0.13.9 python=3.7.10 pytorch=1.7.1 torchvision=0.8.2 timm=0.4.5

Thx!

theFoxofSky commented 3 years ago

I use the same environment. It should be ok, please pull again and re-build the ddf operation.

sujyQ commented 3 years ago

I did it but it remains same.

sujyQ commented 3 years ago

It's solved! I re-installed apex:)

sujyQ commented 3 years ago

Sorry to bothering you (@theFoxofSky) again.

I was trying to install setup.py via another conda env and got error :

`running install running bdist_egg running egg_info creating ddf.egg-info writing ddf.egg-info/PKG-INFO writing dependency_links to ddf.egg-info/dependency_links.txt writing top-level names to ddf.egg-info/top_level.txt writing manifest file 'ddf.egg-info/SOURCES.txt' reading manifest file 'ddf.egg-info/SOURCES.txt' writing manifest file 'ddf.egg-info/SOURCES.txt' installing library code to build/bdist.linux-x86_64/egg running install_lib running build_ext building 'ddf_mul_ext' extension creating build creating build/temp.linux-x86_64-3.6 creating build/temp.linux-x86_64-3.6/src creating build/temp.linux-x86_64-3.6/src/cuda gcc -pthread -B /home/hsj/anaconda3/envs/dasr_ddf/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/hsj/anaconda3/envs/dasr_ddf/lib/python3.6/site-packages/torch/include -I/home/hsj/anaconda3/envs/dasr_ddf/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/hsj/anaconda3/envs/dasr_ddf/lib/python3.6/site-packages/torch/include/TH -I/home/hsj/anaconda3/envs/dasr_ddf/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/hsj/anaconda3/envs/dasr_ddf/include/python3.6m -c src/ddf_mul_ext.cpp -o build/temp.linux-x86_64-3.6/src/ddf_mul_ext.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=ddf_mul_ext -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ gcc -pthread -B /home/hsj/anaconda3/envs/dasr_ddf/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA -I/home/hsj/anaconda3/envs/dasr_ddf/lib/python3.6/site-packages/torch/include -I/home/hsj/anaconda3/envs/dasr_ddf/lib/python3.6/site-packages/torch/include/torch/csrc/api/include -I/home/hsj/anaconda3/envs/dasr_ddf/lib/python3.6/site-packages/torch/include/TH -I/home/hsj/anaconda3/envs/dasr_ddf/lib/python3.6/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/hsj/anaconda3/envs/dasr_ddf/include/python3.6m -c src/cuda/ddf_mul_cuda.cpp -o build/temp.linux-x86_64-3.6/src/cuda/ddf_mul_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=ddf_mul_ext -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11 cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ src/cuda/ddf_mul_cuda.cpp: In function ‘int ddf_mul_forward_cuda(at::Tensor, at::Tensor, at::Tensor, int, int, int, at::Tensor)’: src/cuda/ddf_mul_cuda.cpp:27:86: error: ‘TORCH_CHECK’ was not declared in this scope

define CHECK_CUDA(x) TORCH_CHECK(x.device().is_cuda(), #x, " must be a CUDA tensor ")

                                                                                  ^

src/cuda/ddf_mul_cuda.cpp:31:5: note: in expansion of macro ‘CHECK_CUDA’ CHECK_CUDA(x); \ ^~~~~~ src/cuda/ddf_mul_cuda.cpp:37:5: note: in expansion of macro ‘CHECK_INPUT’ CHECK_INPUT(features); ^~~ src/cuda/ddf_mul_cuda.cpp: In function ‘int ddf_mul_backward_cuda(at::Tensor, at::Tensor, at::Tensor, at::Tensor, int, int, int, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor)’: src/cuda/ddf_mul_cuda.cpp:27:86: error: ‘TORCH_CHECK’ was not declared in this scope

define CHECK_CUDA(x) TORCH_CHECK(x.device().is_cuda(), #x, " must be a CUDA tensor ")

                                                                                  ^

src/cuda/ddf_mul_cuda.cpp:31:5: note: in expansion of macro ‘CHECK_CUDA’ CHECK_CUDA(x); \ ^~~~~~ src/cuda/ddf_mul_cuda.cpp:66:5: note: in expansion of macro ‘CHECK_INPUT’ CHECK_INPUT(top_grad); ^~~ error: command 'gcc' failed with exit status 1`

My environment :

_libgcc_mutex 0.1 main apex 0.1 pypi_0 pypi attrs 20.2.0 py_0 anaconda backports 1.0 py_2 conda-forge backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge blas 1.0 openblas ca-certificates 2021.5.30 ha878542_0 conda-forge certifi 2021.5.30 py36h5fab9bb_0 conda-forge cffi 1.14.6 py36h400218f_0 colorama 0.4.4 pyh9f0ad1d_0 conda-forge cudatoolkit 10.0.130 0 cxxfilt 0.2.2 py36h831f99a_1 conda-forge decorator 5.0.9 pyhd8ed1ab_0 conda-forge freetype 2.10.4 h5ab3b9f_0 importlib-metadata 2.0.0 py_1 anaconda importlib_metadata 2.0.0 1 anaconda iniconfig 1.1.1 py_0 anaconda intel-openmp 2021.3.0 h06a4308_3350 ipdb 0.13.9 pyhd8ed1ab_0 conda-forge ipython 5.8.0 py36_1 conda-forge ipython_genutils 0.2.0 py_1 conda-forge jpeg 9b h024ee3a_2 lcms2 2.12 h3be6417_0 ld_impl_linux-64 2.35.1 h7274673_9 libffi 3.3 he6710b0_2 libgcc-ng 9.1.0 hdf63c60_0 libgfortran-ng 7.3.0 hdf63c60_0 libopenblas 0.3.13 h4367d64_0 libpng 1.6.37 hbc83047_0 libstdcxx-ng 9.1.0 hdf63c60_0 libtiff 4.2.0 h85742a9_0 libwebp-base 1.2.0 h27cfd23_0 lz4-c 1.9.3 h2531618_0 mkl 2021.3.0 h06a4308_520 more-itertools 8.5.0 py_0 anaconda ncurses 6.2 he6710b0_1 ninja 1.10.2 hff7bd54_1 numpy 1.17.0 py36h99e49ec_0 numpy-base 1.17.0 py36h2f8d375_0 olefile 0.46 py36_0 openjpeg 2.3.0 h05c96fa_1 openssl 1.1.1k h27cfd23_0 packaging 20.4 py_0 anaconda pexpect 4.8.0 pyh9f0ad1d_2 conda-forge pickleshare 0.7.5 py_1003 conda-forge pillow 8.3.1 py36h2c7a002_0 pip 21.2.2 py36h06a4308_0 pluggy 0.13.1 py36_0 anaconda prompt_toolkit 1.0.15 py_1 conda-forge ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge py 1.9.0 py_0 anaconda pycparser 2.20 py_2 pygments 2.9.0 pyhd8ed1ab_0 conda-forge pyparsing 2.4.7 py_0 anaconda pytest 6.1.1 py36_0 anaconda python 3.6.13 h12debd9_1 python_abi 3.6 2_cp36m conda-forge pytorch 1.1.0 py3.6_cuda10.0.130_cudnn7.5.1_0 pytorch pyyaml 5.3.1 py36h7b6447c_1 anaconda readline 8.1 h27cfd23_0 setuptools 52.0.0 py36h06a4308_0 simplegeneric 0.8.1 py_1 conda-forge six 1.16.0 pyhd3eb1b0_0 sqlite 3.36.0 hc218d9a_0 timm 0.1.8 pypi_0 pypi tk 8.6.10 hbc83047_0 toml 0.10.1 py_0 anaconda torchvision 0.3.0 py36_cu10.0.130_1 pytorch tqdm 4.62.0 pyhd8ed1ab_0 conda-forge traitlets 4.3.3 py36h9f0ad1d_1 conda-forge wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge wheel 0.36.2 pyhd3eb1b0_0 xz 5.2.5 h7b6447c_0 yaml 0.2.5 h7b6447c_0 anaconda zipp 3.3.1 py_0 anaconda zlib 1.2.11 h7b6447c_3 zstd 1.4.9 haebb681_0

gcc -v :

Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/6/lto-wrapper Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 6.5.0-2ubuntu1~16.04' --with-bugurl=file:///usr/share/doc/gcc-6/README.Bugs --enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --with-as=/usr/bin/x86_64-linux-gnu-as --with-ld=/usr/bin/x86_64-linux-gnu-ld --program-suffix=-6 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-6-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-6-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-6-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 6.5.0 20181026 (Ubuntu 6.5.0-2ubuntu1~16.04)

theFoxofSky commented 3 years ago

Before torch1.2, the function AT_CHECK was used. With the last version, however, it has been replaced by TORCH_CHECK in order to get rid of all ATen references altogether.

Please upgrade your pytorch