pytorch / examples

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
https://pytorch.org/examples
BSD 3-Clause "New" or "Revised" License
22.28k stars 9.53k forks source link

Segmetation fault occurs after mnist cpp code in GPU #597

Open Sharath-Ramachandran opened 5 years ago

Sharath-Ramachandran commented 5 years ago

I tried reducing the batch size to 4. It does not throw the error when using CPU. I ran the code with 15 epochs. image

asimonov commented 5 years ago

I see core dump as well, but in different place (https://github.com/pytorch/examples/blob/master/cpp/mnist/mnist.cpp#L128)

I am on Ubuntu 14.04, libtorch 1.2.0, cuda 10.0, 1050Ti GPU.

root@poise:~/libtorch/examples/cpp/mnist/build# cmake -DCMAKE_PREFIX_PATH=/home/libtorch/pre-cxx11/ ..
-- The C compiler identification is GNU 4.8.4
-- The CXX compiler identification is GNU 4.8.4
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /usr/local/cuda (found version "10.0") 
-- Caffe2: CUDA detected: 10.0
-- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda
-- Caffe2: Header version is: 10.0
-- Found CUDNN: /usr/include  
-- Found cuDNN: v7.6.0  (include: /usr/include, library: /usr/lib/x86_64-linux-gnu/libcudnn.so)
-- Autodetected CUDA architecture(s):  6.1
-- Added CUDA NVCC flags for: -gencode;arch=compute_61,code=sm_61
-- Found torch: /home/libtorch/pre-cxx11/lib/libtorch.so  
-- Downloading MNIST dataset
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz ...
0% |################################################################| 100%
Unzipped /home/libtorch/examples/cpp/mnist/build/data/train-images-idx3-ubyte.gz ...
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz ...
0% |################################################################| 100%
Unzipped /home/libtorch/examples/cpp/mnist/build/data/train-labels-idx1-ubyte.gz ...
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz ...
0% |################################################################| 100%
Unzipped /home/libtorch/examples/cpp/mnist/build/data/t10k-images-idx3-ubyte.gz ...
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz ...
0% |################################################################| 100%
Unzipped /home/libtorch/examples/cpp/mnist/build/data/t10k-labels-idx1-ubyte.gz ...
-- Configuring done
-- Generating done
-- Build files have been written to: /home/libtorch/examples/cpp/mnist/build
root@poise:~/libtorch/examples/cpp/mnist/build# make
Scanning dependencies of target mnist
[ 50%] Building CXX object CMakeFiles/mnist.dir/mnist.cpp.o
[100%] Linking CXX executable mnist
[100%] Built target mnist
root@poise:~/libtorch/examples/cpp/mnist/build# ./mnist 
CUDA available! Training on GPU.
creating device
creating model
*** Error in `./mnist': munmap_chunk(): invalid pointer: 0x0000000001958570 ***
Aborted (core dumped)
asimonov commented 5 years ago

it looks like it is not a GPU issue - if I try to run with kCPU it still gives an error. it is more like Ubuntu 14 issue. what can I try to make it run?

msaroufim commented 2 years ago

Does the error go away if you run on a more recent version of ubuntu?