ufal / udpipe

UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files
Mozilla Public License 2.0
358 stars 75 forks source link

Segfault from training UD_Finnish 2 #25

Closed flammie closed 7 years ago

flammie commented 7 years ago

I just tried training out of the box UD_Finnish from the 2 version, and I am not able to get any other result than segfault. I have tried sedding spaces away to workaround issue #21, but this had no effect. I tried training both with ./udpipe --train UD_Finnish-2.0.udpipe fi-ud-train.conllu as well as
cat fi-ud-train.conllu | ./udpipe --train fi-ud-2.0.udpipe but it made no difference either. It always ends in:

Epoch 99, logprob: -2.4581e+03, training acc: 99.84%
Epoch 100, logprob: -2.4962e+03, training acc: 99.84%
Creating morphological dictionary for tagger model 1.
Training tagger model 1.
Speicherzugriffsfehler (Speicherabzug geschrieben)
flammie commented 7 years ago

Gdb says:

Program received signal SIGSEGV, Segmentation fault.
0x000055555559db88 in ufal::udpipe::morphodita::morpho_dictionary<ufal::udpipe::morphodita::generic_lemma_addinfo>::load(ufal::udpipe::utils::binary_decoder&) ()
(gdb) bt
#0  0x000055555559db88 in ufal::udpipe::morphodita::morpho_dictionary<ufal::udpipe::morphodita::generic_lemma_addinfo>::load(ufal::udpipe::utils::binary_decoder&) ()
#1  0x0000555555597398 in ufal::udpipe::morphodita::generic_morpho::load(std::istream&) ()
#2  0x000055555559edca in ufal::udpipe::morphodita::morpho::load(std::istream&) ()
#3  0x000055555566af13 in ufal::udpipe::morphodita::tagger_trainer<ufal::udpipe::morphodita::perceptron_tagger_trainer<ufal::udpipe::morphodita::feature_sequences<ufal::udpipe::morphodita::conllu_elementary_features<ufal::udpipe::morphodita::training_elementary_feature_map>, ufal::udpipe::morphodita::training_feature_sequence_map> > >::train(int, int, int, std::istream&, bool, std::istream&, bool, std::istream&, std::istream&, bool, std::ostream&) ()
#4  0x000055555564f477 in ufal::udpipe::trainer_morphodita_parsito::train_tagger_model(std::vector<ufal::udpipe::sentence, std::allocator<ufal::udpipe::sentence> > const&, std::vector<ufal::udpipe::sentence, std::allocator<ufal::udpipe::sentence> > const&, unsigned int, unsigned int, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&, std::ostream&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ()
#5  0x0000555555651758 in ufal::udpipe::trainer_morphodita_parsito::train_tagger(std::vector<ufal::udpipe::sentence, std::allocator<ufal::udpipe::sentence> > const&, std::vector<ufal::udpipe::sentence, std::allocator<ufal::udpipe::sentence> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ()
#6  0x0000555555651c7f in ufal::udpipe::trainer_morphodita_parsito::train(std::vector<ufal::udpipe::sentence, std::allocator<ufal::udpipe::sentence> > const&, std::vector<ufal::udpipe::sentence, std::allocator<ufal::udpipe::sentence> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ()
#7  0x0000555555648553 in ufal::udpipe::trainer::train(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<ufal::udpipe::sentence, std::allocator<ufal::udpipe::sentence> > const&, std::vector<ufal::udpipe::sentence, std::allocator<ufal::udpipe::sentence> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) ()
#8  0x000055555555da89 in main ()
foxik commented 7 years ago

This seems like #24 -- one of recent additions seems to be miscompiled by g++ 6. I tried to circumvent it by f13aff560, so please try current HEAD.

flammie commented 7 years ago

aha, thanks, this seems to have worked, sorry for missing the other bug report and making a duplicate.