tmbdev / clstm

A small C++ implementation of LSTM networks, focused on OCR.
Apache License 2.0
821 stars 224 forks source link

Segmentation fault when running clstmocr on pre-trained model #139

Open drdrsh opened 7 years ago

drdrsh commented 7 years ago

I am trying to build clstm on an ubuntu 16.04 machine. My steps of installation are

sudo apt-get install scons libprotobuf-dev libprotobuf9v5 protobuf-compiler libpng-dev libeigen3-dev swig scons sudo scons install ./run_tests

This has a happy ending where the build succeeds and all tests pass.

Now onto the problem

I start by downloading a pre-trained model and download a random image from the web (I don't really care about the results now)
wget https://www.alislam.org/quran/search2/verses/017-002.png wget https://raw.githubusercontent.com/mittagessen/kraken-models/master/clstm/arabic-buldan/arabic-buldan.clstm echo "./017-002.png" > files load=arabic-buldan.clstm ./clstmocr files The last line results in segmentation fault.

I ran protobuf on the clstm file against clstm.proto and the NetworkProto message was read successfully.

I ran the core dump through gdb and got #0 0x00000000004195be in std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ocropus::String>, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ocropus::String> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, ocropus::String> > >::_M_begin (this=0xa8) at /usr/include/c++/5/bits/stl_tree.h:652 652 { return static_cast<_Link_type>(this->_M_impl._M_header._M_parent); }

I first encountered this problem in the python module and tracked it down to clstm.load_net() call, the call doesn't segfault if I supply an invalid path or an invalid model only when I supply a working model does it fail.

I am running Ubuntu 16.04 with 4 gigs of ram

drdrsh commented 7 years ago

I managed to get a debugging environment going and localized the error to clstmhl.h line 107

The saved model has a layer of the kind "NPLSTM_SigmoidTanhTanh" and apparently it is not registered with the factory and as such it returns an empty pointer that is then dereferenced causing the crash.

What can I do to address this problem? Should I read the model outside clstm, modify it and reserialize it or maybe build an earlier version of the code where that layer existed