Closed E1EV1 closed 6 years ago
I just find a path of research in the Linux terminal, there is a layer that is not created. It could explain why I have a Segmentation Fault, I think the layer_pointer point to an undefined layer. I'll keep you informed
Unfortunately, solving the problem of the creation of the layer_flow_gt_aug_FlowAugmentation1_0_split didn't change anything. I put you the screenshot of gdb where we could see that the Tread 6 received signal SIGSEGV.
Hi, is it possible that there is a difference between the libraries used at compile time and the ones used at runtime? For example, a "popular" error is that people have multiple Caffe installations which interfere with each other.
Thank you for your reply, normally there is no risk of this type. For the time being, I've install only Flownet2 with your Caffe version on this computer to avoid interference.
For two days, I tried a lot of things that we can read on forums: I recompiled all FlowNet2, modified the .bashrc, modified the makefile.config but without success. I just find something to try, on the Nvidia documentation, we can read that Cuda 8 doesn't work correctly with gcc if gcc version > 5.3.1. My gcc is 5.4 so I will downgrade for testing.
If somebody has an idea I'm more than interested
Modify the gcc version is useless, since Cuda 8.0.61 gcc 5.4 is allowed. I try to debug directly the thread now, I put the result of debug below if anyone finds an explanation.
'''Thread 6 "caffe" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffbef4f700 (LWP 18797)]
0x00007ffff747fe60 in void caffe::CustomDataLayerPrefetch
Thread 6 (Thread 0x7fffbef4f700 (LWP 18797)):
from /home/ewan/Documents/flownet2-master/.build_release/tools/../lib/libcaffe.so.1.0.0-rc3
at pthread_create.c:333
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 5 (Thread 0x7fffc5003700 (LWP 18795)):
at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
at pthread_create.c:333
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 4 (Thread 0x7fffc5804700 (LWP 18794)):
---Type
at pthread_create.c:333
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 3 (Thread 0x7fffc6005700 (LWP 18793)):
flags=524288) at ../sysdeps/unix/sysv/linux/accept4.c:40
at pthread_create.c:333
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 2 (Thread 0x7fffc75f2700 (LWP 18791)):
---Type
at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
Thread 1 (Thread 0x7ffff7f6db00 (LWP 18787)):
from /usr/lib/nvidia-390/libnvidia-fatbinaryloader.so.390.25
from /usr/lib/x86_64-linux-gnu/libcudnn.so.7
---Type
from /home/ewan/Documents/flownet2-master/.build_release/tools/../lib/libcaffe.so.1.0.0-rc3
from /home/ewan/Documents/flownet2-master/.build_release/tools/../lib/libcaffe.so.1.0.0-rc3
Your backtrace indicates that you are using CuDNN version 7. We've only ever used version 5. I know that it's relatively easy to make the code compatible with version 6, but I never tried 7.
Thank you for your reply, I will try by downgrading my CuDNN version. Normally it should not change much if I refer to https://github.com/lmb-freiburg/flownet2/issues/92 but we never know.
It's very strange, I can run FlowNet2 and build without problem but I can't train or fine-tune.
Hm, that's strange, but it really might be a problem with CuDNN. But it might be worth asking the people in #92 whether they actually used training, or just testing :wink:
Thank you for all your help @nikolausmayer , Yes that's why I downgraded my CuDNN version but unfortunately I've always got the same issue :( I put the error message below.
My setup : Ubuntu 16.04, 980Ti, Cuda 8.0.61, CuDNN 5.1, gcc 5.4, python 3.5. If anyone have a suggestion I'm really interested :)
'''Thread 6 "caffe" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc9f02700 (LWP 10153)]
0x00007ffff7481460 in void caffe::CustomDataLayerPrefetch
Thread 6 (Thread 0x7fffc9f02700 (LWP 10153)):
Thread 5 (Thread 0x7fffcbf47700 (LWP 10152)):
Thread 4 (Thread 0x7fffcc748700 (LWP 10151)):
Thread 3 (Thread 0x7fffccf49700 (LWP 10150)):
Thread 2 (Thread 0x7fffce536700 (LWP 10148)):
Thread 1 (Thread 0x7ffff7f6db00 (LWP 10144)):
from /home/ewan/Documents/flownet2-master/.build_release/tools/../lib/libcaffe.so.1.0.0-rc3
from /home/ewan/Documents/flownet2-master/.build_release/tools/../lib/libcaffe.so.1.0.0-rc3
Ok I found why I had an error !!! I had some pictures in my dataset which didn't have the same size than the others. Now all the images have the same size and I can fine-tune without SIGSEGV.
Thank you @nikolausmayer for your help
Nice job. I guess it would be good if the converters or data layers checked for this... :slightly_smiling_face:
Hi,
I'm trying to fine-tune FlowNet2 with my own dataset. I formatted my database in lmdb and modified the FlowNet2_train.prototxt for fitting with my problematic.
Then when I started my training, I faced with a "Segmentation Fault" during the CustomDataLayerPrefetch and I don't know where the error comes from.
Any suggestion ?