naibaf7 / caffe

Caffe: a fast open framework for deep learning. With OpenCL and CUDA support.
http://caffe.berkeleyvision.org/
Other
85 stars 20 forks source link

Train failed with "double free or corruption" on caffe::SyncedMemory::to_gpu() #22

Open inferrna opened 8 years ago

inferrna commented 8 years ago

Training on large dataset (2392012 values) got this error

inferno@hmstr:~/Soft/ocr/neurowords$ gdb /usr/bin/python3.5 core
GNU gdb (Ubuntu 7.10-1ubuntu2) 7.10
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/bin/python3.5...(no debugging symbols found)...done.
[New LWP 12749]
[New LWP 12765]
[New LWP 12771]
[New LWP 12754]
[New LWP 12766]
[New LWP 12770]
[New LWP 12767]
[New LWP 12753]
[New LWP 12768]
[New LWP 12769]
[New LWP 12763]
[New LWP 12764]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `python3.5 -i 01-learning-lenet_mmvb.py'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f332dc42267 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55
55  ../sysdeps/unix/sysv/linux/raise.c: Нет такого файла или каталога.
[Current thread is 1 (Thread 0x7f332e423700 (LWP 12749))]
(gdb) bt
#0  0x00007f332dc42267 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:55
#1  0x00007f332dc43eca in __GI_abort () at abort.c:89
#2  0x00007f332dc85bf3 in __libc_message (do_abort=do_abort@entry=1, fmt=fmt@entry=0x7f332dd9e168 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/posix/libc_fatal.c:175
#3  0x00007f332dc8dc09 in malloc_printerr (ptr=<optimized out>, str=0x7f332dd9e298 "double free or corruption (!prev)", action=1) at malloc.c:4965
#4  _int_free (av=<optimized out>, p=<optimized out>, have_lock=0) at malloc.c:3834
#5  0x00007f332dc9183c in __GI___libc_free (mem=<optimized out>) at malloc.c:2950
#6  0x00007f331e4c7157 in caffe::SyncedMemory::to_gpu() () from /home/inferno/.dev/caffe-cl/caffe/lib/libcaffe.so.1.0.0-rc3
#7  0x00007f331e4c18e9 in caffe::SyncedMemory::gpu_data() () from /home/inferno/.dev/caffe-cl/caffe/lib/libcaffe.so.1.0.0-rc3
#8  0x00007f331e5e7d52 in caffe::Blob<float>::gpu_data() const () from /home/inferno/.dev/caffe-cl/caffe/lib/libcaffe.so.1.0.0-rc3
#9  0x00007f331e4ab14d in caffe::Net<float>::ForwardFromTo(int, int) () from /home/inferno/.dev/caffe-cl/caffe/lib/libcaffe.so.1.0.0-rc3
#10 0x00007f331e4ab476 in caffe::Net<float>::Forward(float*) () from /home/inferno/.dev/caffe-cl/caffe/lib/libcaffe.so.1.0.0-rc3
#11 0x00007f331e67c190 in caffe::Solver<float>::Step(int) () from /home/inferno/.dev/caffe-cl/caffe/lib/libcaffe.so.1.0.0-rc3
#12 0x00007f331ead01b1 in caffe::Step_NoGIL(caffe::Solver<float>&, int) () from /home/inferno/Soft/ocr/neurowords/caffe/_caffe.so
#13 0x00007f331eaecb08 in boost::python::objects::caller_py_function_impl<boost::python::detail::caller<float (*)(caffe::Solver<float>&, int), boost::python::default_call_policies, boost::mpl::vector3<float, caffe::Solver<float>&, int> > >::operator()(_object*, _object*) () from /home/inferno/Soft/ocr/neurowords/caffe/_caffe.so
#14 0x00007f331dac6d7d in boost::python::objects::function::call(_object*, _object*) const () from /usr/lib/x86_64-linux-gnu/libboost_python-py35.so.1.58.0
#15 0x00007f331dac6f68 in ?? () from /usr/lib/x86_64-linux-gnu/libboost_python-py35.so.1.58.0
#16 0x00007f331dacefb3 in boost::python::detail::exception_handler::operator()(boost::function0<void> const&) const ()
   from /usr/lib/x86_64-linux-gnu/libboost_python-py35.so.1.58.0
#17 0x00007f331eae4658 in boost::detail::function::function_obj_invoker2<boost::_bi::bind_t<bool, boost::python::detail::translate_exception<std::exception, void (*)(std::exception)>, boost::_bi::list3<boost::arg<1>, boost::arg<2>, boost::_bi::value<void (*)(std::exception)> > >, bool, boost::python::detail::exception_handler const&, boost::function0<void> const&>::invoke(boost::detail::function::function_buffer&, boost::python::detail::exception_handler const&, boost::function0<void> const&)
    () from /home/inferno/Soft/ocr/neurowords/caffe/_caffe.so
#18 0x00007f331daced6d in boost::python::handle_exception_impl(boost::function0<void>) () from /usr/lib/x86_64-linux-gnu/libboost_python-py35.so.1.58.0
#19 0x00007f331dac4189 in ?? () from /usr/lib/x86_64-linux-gnu/libboost_python-py35.so.1.58.0
#20 0x00000000005ad28e in PyObject_Call ()
#21 0x0000000000520ba1 in PyEval_EvalFrameEx ()
#22 0x0000000000552648 in ?? ()
#23 0x00000000005faf1f in PyEval_EvalCode ()
#24 0x00000000005f56a2 in ?? ()
#25 0x00000000005f730a in PyRun_FileExFlags ()
#26 0x00000000005f7f6c in PyRun_SimpleFileExFlags ()
#27 0x000000000062b512 in Py_Main ()
#28 0x00000000004cbe6f in main ()

train.prototxt:

layer {
  name: "data"
  type: "Data"
  top: "data"
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "float_words_img"
    batch_size: 2392
    backend: LMDB
  }
}
layer {
  name: "label"
  type: "Data"
  top: "label"
  transform_param {
    scale: 0.00390625
  }
  data_param {
    source: "float_words_lbl"
    batch_size: 2392
    backend: LMDB
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "data"
  top: "ip1"
  inner_product_param {
    num_output: 48
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "r1"
  type: "ReLU"
  bottom: "ip1"
  top: "ip1"
}
layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  inner_product_param {
    num_output: 8
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "ip3"
  type: "InnerProduct"
  bottom: "ip2"
  top: "ip3"
  inner_product_param {
    num_output: 48
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "r2"
  type: "ReLU"
  bottom: "ip3"
  top: "ip3"
}
layer {
  name: "ip4"
  type: "InnerProduct"
  bottom: "ip3"
  top: "ip4"
  inner_product_param {
    num_output: 28
    weight_filler {
      type: "xavier"
    }
  }
}
layer {
  name: "loss"
  type: "EuclideanLoss"
  bottom: "ip4"
  bottom: "label"
  top: "loss"
}