isl-org / StableViewSynthesis

MIT License
211 stars 34 forks source link

Segmentation fault on retrain the network or run evaluation #11

Closed asharma-fy closed 3 years ago

asharma-fy commented 3 years ago

Hi Gernot,

Thank you for the great work!

I was trying to get the evaluation scripts exp.py running for both evaluation and retrain. However, I consistently get a segmentation fault like so:

python exp.py --net resunet3.16_penone.dirs.avg.seq+9+1+unet+5+2+16.single+mlpdir+mean+3+64+16 --cmd retrain
.
.
.
.
[2021-05-31/09:52/INFO/mytorch] Setup training data loader and other stuff                                                                                                            
invalid device function in /home/fyusion/Documents/projects/StableViewSynthesis/ext/mytorch/include/common_cuda.h at 171                                                             
[1]    633554 segmentation fault (core dumped)  python exp.py --net  --cmd retrain   

Some more details of my system installation:

python -c 'from torch.utils.collect_env import main; main()'
Collecting environment information...
PyTorch version: 1.6.0
Is debug build: No
CUDA used to build PyTorch: 10.2

OS: Ubuntu 20.04.2 LTS
GCC version: (Ubuntu 7.5.0-6ubuntu2) 7.5.0
CMake version: version 3.16.3

Python version: 3.6
Is CUDA available: Yes
CUDA runtime version: 10.0.130
GPU models and configuration:
GPU 0: TITAN V
GPU 1: GeForce RTX 2080 Ti
GPU 2: GeForce RTX 2080 Ti

Nvidia driver version: 460.73.01
cuDNN version: /usr/lib/cuda-10.0/lib64/libcudnn.so.7.4.1

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.6.0
[pip3] torch-geometric==1.7.0
[pip3] torch-scatter==2.0.6
[pip3] torch-sparse==0.6.9
[pip3] torchvision==0.7.0
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               10.2.89              hfd86e86_1
[conda] mkl                       2020.2                      256
[conda] mkl-service               2.3.0            py36he8ac12f_0
[conda] mkl_fft                   1.3.0            py36h54f3939_0
[conda] mkl_random                1.1.1            py36h0573a6f_0
[conda] numpy                     1.19.2           py36h54aff64_0
[conda] numpy-base                1.19.2           py36hfa32c7d_0
[conda] pytorch                   1.6.0           py3.6_cuda10.2.89_cudnn7.6.5_0    pytorch
[conda] torch-geometric           1.7.0                    pypi_0    pypi
[conda] torch-scatter             2.0.6                    pypi_0    pypi
[conda] torch-sparse              0.6.9                    pypi_0    pypi
[conda] torchvision               0.7.0                py36_cu102    pytorch
KaLiMaLi555 commented 3 years ago

Hey @asharma-fy, I am facing the same issue during training and evaluation Since this issue was closed, I am hoping you got a fix for this. Can you help me? Thanks in advance

akashsharma02 commented 3 years ago

@KaLiMaLi555 I don't remember exactly, but my issue was fixed when I updated my cudatoolkit version from 10.1.135 to 11+, and following pip installations of the exact versions on the repository in a conda environment with Python 3.8+.

Hope this was helpful.