naibaf7 / caffe

Caffe: a fast open framework for deep learning. With OpenCL and CUDA support.
http://caffe.berkeleyvision.org/
Other
85 stars 20 forks source link

curand set error in common.cpp #15

Closed yuta-mizuno closed 8 years ago

yuta-mizuno commented 8 years ago

Hi,

I'm glad that you revealed your sources of caffe,caffe neural tool,and caffe neural models. This issue may be about Caffe neural models(https://github.com/naibaf7/caffe_neural_models), but the error seems to occur at /caffe/src/caffe/common.cpp in https://github.com/naibaf7/caffe/blob/master/src/caffe/common.cpp, thus I have report at this page.

I'm trying to implement benchmark_u.sh with these tools, but it gas go wrong.

$ sh benchmark_u.sh E1208 14:47:37.905019 15501 common.cpp:220] ! Cannot create Curand generator. Curand won't be available. Segmentation fault (core dumped)

Corresponding part in common.cpp

if (curandCreateGenerator(&curandgenerator64, CURAND_RNG_QUASI_SOBOL64) != CURAND_STATUS_SUCCESS || curandSetPseudoRandomGeneratorSeed(curandgenerator64, cluster_seedgen()) != CURAND_STATUS_SUCCESS) { LOG(ERROR) << "! Cannot create Curand generator. Curand won't be available."; }

I comfirmed that the value of first expression is TRUE, and value of curandSetPseudoRandomGeneratorSeed(curandgenerator64, cluster_seedgen()) was 103(CURAND_STATUS_TYPE_ERROR).

I also tried to replace curandSetPseudoRandomGeneratorSeed() with curandSetQuasiRandomGeneratorDimensions(), but output was 104(CURAND_STATUS_OUT_OF_RANGE).

My Development environment is Ubuntu14.04 NVIDIA Quadro K4200 driver version 346.46 Cuda 7.0.27

I don't know how I should modify my environment or programs to remove these problems. I would be most grateful if you could show solutions about them.

naibaf7 commented 8 years ago

Ok I will look into this later today. However I think the segfault and the curand error might be two separate issues. What does GDB (add 'gdb --args' in the sh file, type 'run' and, after execution 'where') tell you?

yuta-mizuno commented 8 years ago

Thank you for your response. Following result is output.

$ sh gdb_u.sh 
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./../../caffe_neural_tool/build/caffe_neural_tool...done.
(gdb) run
Starting program: /home/y-mizuno/caffe_neural_tool/build/caffe_neural_tool --gpu 2 --benchmark 0 --proto train_process_u_2.prototxt
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffda73a700 (LWP 19506)]
[New Thread 0x7fffd9f39700 (LWP 19507)]
[New Thread 0x7fffd7738700 (LWP 19508)]
[New Thread 0x7fffd4f37700 (LWP 19509)]
[New Thread 0x7fffd2736700 (LWP 19510)]
[New Thread 0x7fffcff35700 (LWP 19511)]
[New Thread 0x7fffcd734700 (LWP 19512)]
[New Thread 0x7fffcaf33700 (LWP 19513)]
[New Thread 0x7fffc8732700 (LWP 19514)]
[New Thread 0x7fffc5f31700 (LWP 19515)]
[New Thread 0x7fffc3730700 (LWP 19516)]
[New Thread 0x7fffbeae3700 (LWP 19518)]
[New Thread 0x7fffbd458700 (LWP 19519)]
E1208 17:41:28.636894 19502 common.cpp:220] ! Cannot create Curand generator. Curand won't be available.

Program received signal SIGSEGV, Segmentation fault.
0x0000000000697310 in caffe::device::backend() const ()
(gdb) where
#0  0x0000000000697310 in caffe::device::backend() const ()
#1  0x000000000069b75c in caffe::Caffe::SetDevice(int) ()
#2  0x000000000045b380 in main (argc=7, argv=0x7fffffffdeb8)
    at src/caffe_neural_tool.cpp:95
naibaf7 commented 8 years ago

Ok, seems like it is trying to use the wrong device actually.

What is the output of:

and does:

Also make sure to update all repositories to the latest versions.

yuta-mizuno commented 8 years ago

I really appreciate you.

./../../caffe_neural_tool/build/caffe_neural_tool --gpu 2 --benchmark 0 --proto 'train_process_u_2.prototxt'
./build/caffe_neural_tool --device
I1209 13:38:40.204485 22018 common.cpp:323] Total devices: 2
I1209 13:38:40.204574 22018 common.cpp:324] CUDA devices: 1
I1209 13:38:40.204583 22018 common.cpp:325] OpenCL devices: 1
I1209 13:38:40.204856 22018 common.cpp:332] Device id:                     0
I1209 13:38:40.204869 22018 common.cpp:334] Device backend:                CUDA
I1209 13:38:40.204875 22018 common.cpp:336] Backend details:               CUDA
I1209 13:38:40.204881 22018 common.cpp:338] Device vendor:                 NVIDIA Corporation
I1209 13:38:40.204887 22018 common.cpp:340] Name:                          Quadro K4200
I1209 13:38:40.204895 22018 common.cpp:342] Total global memory:           4294639616
I1209 13:38:40.204902 22018 common.cpp:349] Device id:                     1
I1209 13:38:40.204910 22018 common.cpp:351] Device backend:                OpenCL
I1209 13:38:40.204929 22018 common.cpp:353] Backend details:               NVIDIA Corporation: OpenCL 1.1 CUDA 7.0.28
I1209 13:38:40.204941 22018 common.cpp:355] Device vendor:                 NVIDIA Corporation
I1209 13:38:40.204982 22018 common.cpp:357] Name:                          Quadro K4200
I1209 13:38:40.205032 22018 common.cpp:359] Total global memory:           4294639616
USE_CUDA := 1
USE_GREENTEA := 1
VIENNACL_DIR = ../ViennaCL
USE_CLBLAS := 0
CUDA_ARCH := -gencode arch=compute_20,code=sm_20 \
        -gencode arch=compute_20,code=sm_21 \
        -gencode arch=compute_30,code=sm_30 \
        -gencode arch=compute_35,code=sm_35 \
        -gencode arch=compute_50,code=sm_50 \
        -gencode arch=compute_50,code=compute_50
USE_CUDA := 1
USE_CUDNN := 0
USE_GREENTEA := 1
VIENNACL_DIR = ../ViennaCL
USE_CLBLAS := 0
USE_VIENNACLBLAS := 1
CUDA_DIR := /usr/local/cuda-7.0
[ RUN      ] NetTest/0.TestSharedWeightsUpdate
src/caffe/test/test_net.cpp:1164: Failure
Value of: shared_params.cpu_diff()[i]
  Actual: 82.940231
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 82.940186
src/caffe/test/test_net.cpp:1164: Failure
Value of: shared_params.cpu_diff()[i]
  Actual: 102.78288
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 102.78278
src/caffe/test/test_net.cpp:1164: Failure
Value of: shared_params.cpu_diff()[i]
  Actual: 108.55974
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 108.55963
src/caffe/test/test_net.cpp:1164: Failure
Value of: shared_params.cpu_diff()[i]
  Actual: 91.899338
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 91.899292
src/caffe/test/test_net.cpp:1164: Failure
Value of: shared_params.cpu_diff()[i]
  Actual: 131.06218
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 131.06207
src/caffe/test/test_net.cpp:1164: Failure
Value of: shared_params.cpu_diff()[i]
  Actual: 99.245407
Expected: ip1_weights->cpu_diff()[i] + ip2_weights->cpu_diff()[i]
Which is: 99.2453
[  FAILED  ] NetTest/0.TestSharedWeightsUpdate, where TypeParam = caffe::CPUDevice<float> (2 ms)
[  FAILED  ] 1 test, listed below:
[  FAILED  ] NetTest/0.TestSharedWeightsUpdate, where TypeParam = caffe::CPUDevice<float>

 1 FAILED TEST
  YOU HAVE 2 DISABLED TESTS
naibaf7 commented 8 years ago

ok, so what you should try is to change benchmark_u.sh to:

./../../caffe_neural_tool/build/caffe_neural_tool --gpu 0 --benchmark 0 --proto 'train_process_u_2.prototxt'

That one runtest error is also strange, especially since the values are only a tiny bit off from what they should be. Was not able to reproduce that one (nVidia Titan X and GTX 980). But at least it should work if you select the correct GPU in the benchmark_u.sh

naibaf7 commented 8 years ago

Please also note 4 GB memory is a bit tight/small to run these networks.

yuta-mizuno commented 8 years ago

Thanks to you, the implementation passed up to "Forward pass". After that, following message was called.

ViennaCL: FATAL ERROR: Kernel start failed for '_norm_1_0'.
ViennaCL: Smaller work sizes could not solve the problem. 
terminate called after throwing an instance of 'viennacl::ocl::out_of_resources'
  what():  ViennaCL: FATAL ERROR: CL_OUT_OF_RESOURCES 
 ViennaCL tried to launch a compute kernel, but the device does not provide enough resources. Try changing the global and local work item sizes.
If you think that this is a bug in ViennaCL, please report it at viennacl-support@lists.sourceforge.net and supply at least the following information:
 * Operating System
 * Which OpenCL implementation (AMD, NVIDIA, etc.)
 * ViennaCL version
Many thanks in advance!
Aborted (core dumped)

i'll change work item sizes and try implementations.

naibaf7 commented 8 years ago

If you selected device 0 then there should not be any OpenCL error. Only with device 1 should ViennaCL be used. Did you upgrade to the latest Caffe/Greentea branch?

Either way, you can also try to run benchmark_sk.sh with the -gpu 0 flag.

If that should work (and it should, since all runtests essentially were passed) but benchmark_u.sh and benchmark_usk.sh do not run, then this simply means you run out of GPU memory.

yuta-mizuno commented 8 years ago

I upgraded caffe to the latest one, but runtest found the same error with core damp.

[  FAILED  ] 1 test, listed below:
[  FAILED  ] NetTest/0.TestSharedWeightsUpdate, where TypeParam = caffe::CPUDevice<float>

 1 FAILED TEST
*** Error in `.build_release/test/test_all.testbin': free(): invalid pointer: 0x0000000201106800 ***
*** Aborted at 1450069674 (unix time) try "date -d @1450069674" if you are using GNU date ***
PC: @     0x2b12bb5d2cc9 (unknown)
*** SIGABRT (@0x3e8000034cf) received by PID 13519 (TID 0x2b12b46add00) from PID 13519; stack trace: ***
    @     0x2b12bb5d2d40 (unknown)
    @     0x2b12bb5d2cc9 (unknown)
    @     0x2b12bb5d60d8 (unknown)
    @     0x2b12bb60f394 (unknown)
    @     0x2b12bb61b66e (unknown)
    @     0x2b12ba679898 caffe::SyncedMemory::~SyncedMemory()
    @           0x4ede12 boost::detail::sp_counted_impl_p<>::dispose()
    @           0x47d6da boost::detail::sp_counted_impl_p<>::dispose()
    @           0x4813fe boost::detail::sp_counted_base::release()
    @     0x2b12ba69c0a4 boost::detail::sp_counted_impl_p<>::dispose()
    @     0x2b12ba69b82b std::vector<>::~vector()
    @     0x2b12bb5d85ea (unknown)
    @     0x2b12ba49d123 (unknown)

benchmark_sk.sh succeeded to run, but benchmark_u and usk called different error messages.

I1214 14:43:24.138767 14018 benchmark.cpp:147] Forward pass: 733.716552 ms
F1214 14:43:24.306931 14018 math_functions.cu:128] Check failed: status == CUBLAS_STATUS_SUCCESS (14 vs. 0)  CUBLAS_STATUS_INTERNAL_ERROR
*** Check failure stack trace: ***
    @     0x7ffff6b80daa  (unknown)
    @     0x7ffff6b80ce4  (unknown)
    @     0x7ffff6b806e6  (unknown)
    @     0x7ffff6b83687  (unknown)
    @           0x68be26  caffe::caffe_gpu_asum<>()
    @           0x65bd9e  caffe::SoftmaxWithLossLayer<>::Backward_gpu()
    @           0x5fb50e  caffe::Net<>::BackwardFromTo()
    @           0x425380  caffe_neural::Benchmark()
    @           0x451525  main
    @     0x7fffed095ec5  (unknown)
    @           0x424167  (unknown)
    @              (nil)  (unknown)

Program received signal SIGABRT, Aborted.
0x00007fffed0aacc9 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: そのようなファイルやディレクトリはありません.
(gdb) where
#0  0x00007fffed0aacc9 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007fffed0ae0d8 in __GI_abort () at abort.c:89
#2  0x00007ffff6b88ec3 in ?? () from /usr/lib/x86_64-linux-gnu/libglog.so.0
#3  0x00007ffff6b80daa in google::LogMessage::Fail() ()
   from /usr/lib/x86_64-linux-gnu/libglog.so.0
#4  0x00007ffff6b80ce4 in google::LogMessage::SendToLog() ()
   from /usr/lib/x86_64-linux-gnu/libglog.so.0
#5  0x00007ffff6b806e6 in google::LogMessage::Flush() ()
   from /usr/lib/x86_64-linux-gnu/libglog.so.0
#6  0x00007ffff6b83687 in google::LogMessageFatal::~LogMessageFatal() ()
   from /usr/lib/x86_64-linux-gnu/libglog.so.0
#7  0x000000000068be26 in void caffe::caffe_gpu_asum<float>(long, float const*, float*) ()
#8  0x000000000065bd9e in caffe::SoftmaxWithLossLayer<float>::Backward_gpu(std::vector<caffe::Blob<float>*, std::allocator<caffe::Blob<float>*> > const&, std::vector<bool, std::allocator<bool> > const&, std::vector<caffe::Blob<float>*, std::allocator<caffe::Blob<float>*> > const&) ()
#9  0x00000000005fb50e in caffe::Net<float>::BackwardFromTo(long, long) ()
#10 0x0000000000425380 in caffe_neural::Benchmark (tool_param=..., 
    settings=...) at src/benchmark.cpp:155
#11 0x0000000000451525 in main (argc=7, argv=0x7fffffffde68)
    at src/caffe_neural_tool.cpp:110
F1214 14:17:05.177160 13840 syncedmem.cpp:143] Check failed: error == cudaSuccess (2 vs. 0)  out of memory
*** Check failure stack trace: ***
    @     0x7f65609d8daa  (unknown)
    @     0x7f65609d8ce4  (unknown)
    @     0x7f65609d86e6  (unknown)
    @     0x7f65609db687  (unknown)
    @           0x5f2978  caffe::SyncedMemory::mutable_gpu_data()
    @           0x4f8ef2  caffe::Blob<>::mutable_gpu_data()
    @           0x6859c7  caffe::MergeCropLayer<>::Forward_gpu()
    @           0x5faf9d  caffe::Net<>::ForwardFromTo()
    @           0x42513c  caffe_neural::Benchmark()
    @           0x451525  main
    @     0x7f6556eedec5  (unknown)
    @           0x424167  (unknown)
    @              (nil)  (unknown)
Aborted (core dumped)

USK seems be simply memory shortage. Did U-Net also cause out of memory?

P. S. Sorry, make of "caffe neural tool" has not succeed. I'll fix them, please wait.

yuta-mizuno commented 8 years ago

Full Implementation of benchmark_u.sh didn't run, but I implemented only training part of benchmark_u.sh by replacing "--benchmark 0" with "--train 0", it seemed to succeed. Is it possible to separate train and process part of network? I want to train weights in .caffemodel file with my data by u-net.

naibaf7 commented 8 years ago

@yuta-mizuno If you just replace --benchmark 0 with --train 0 but don't provide input data, nothing will happen. So that will seem to work but it will have done absolutely nothing.

By the way, when you run the benchmark_u.sh, what is the output of "nvidia-smi" in a second console shortly before it fails? This can be used to see if the GPU runs out of memory.

I really think it does run out of memory. 4 GB is not enough for training U and USK models. You could however try to reduce the input/output size of the network to make it work. You will loose a lot of efficiency in the process though.

yuta-mizuno commented 8 years ago

while "benchmark_u.sh" is running,

$ nvidia-smi
Wed Dec 16 13:10:00 2015       
+------------------------------------------------------+                       
| NVIDIA-SMI 346.46     Driver Version: 346.46         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K4200        Off  | 0000:03:00.0      On |                  N/A |
| 31%   49C    P0    96W / 110W |   2013MiB /  4095MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1268    G   /usr/bin/X                                     296MiB |
|    0      2229    G   compiz                                          69MiB |
|    0      4890    C   ...caffe_neural_tool/build/caffe_neural_tool  1632MiB |
+-----------------------------------------------------------------------------+

After trying it several times, the avarage of usage was about 1700MiB.

$ nvidia-smi
Wed Dec 16 13:12:51 2015       
+------------------------------------------------------+                       
| NVIDIA-SMI 346.46     Driver Version: 346.46         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K4200        Off  | 0000:03:00.0      On |                  N/A |
| 33%   54C    P0    84W / 110W |   3767MiB /  4095MiB |     99%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1268    G   /usr/bin/X                                     293MiB |
|    0      2229    G   compiz                                          69MiB |
|    0      4964    C   ...caffe_neural_tool/build/caffe_neural_tool  3389MiB |
+-----------------------------------------------------------------------------+

I also think it is caused by memory shortage. Thank you so much for helping me with so many things.

naibaf7 commented 8 years ago

@yuta-mizuno Yes, you run out of memory.

If you have some patience I can assemble you a low-memory version of U-Net as soon as I have time. Are you interested / is it important to you?

It might be less throughput efficient than the U-Net as it is now but will fit into 4GB, and the weights/learning parameters remain identical.

yuta-mizuno commented 8 years ago

I'm grateful for your kindness. If possible, I'd like to use it during January. Best regards.

naibaf7 commented 8 years ago

Here's the small network: https://github.com/naibaf7/caffe_neural_models/tree/master/net_u_small

Here's the benchmark to it: https://github.com/naibaf7/caffe_neural_models/blob/master/benchmark/benchmark_u_small.sh

you should update all 3 repositories (tool, models, caffe) and clean build. Afterwards it should work. All details are included.

I played it safe with this one, it should work with 2 GB GPU RAM even. Weights are identical and re-usable on bigger networks if needed.

yuta-mizuno commented 8 years ago

It also seemed safe with my equipment. Thank you for your supports!