ufal / udpipe

UDPipe: Trainable pipeline for tokenizing, tagging, lemmatizing and parsing Universal Treebanks and other CoNLL-U files
Mozilla Public License 2.0
358 stars 75 forks source link

Compilation on Windows #18

Closed chrismoor closed 7 years ago

chrismoor commented 7 years ago

Hello,

When I compile the latest release on Windows x64 (gcc 6.3.0 from MinGW-w64) and launchs it, the program shows the help and then crash...without any further error message ! However, the downloaded pre-compiled binary (1.0 release) works perfectly.

Have you ever had this kind of problem ?

foxik commented 7 years ago

No, that does not sound familiar.

What exactly does mean "crash"? If you run just "udpipe" from command-line, what exactly happens? Does this also happens if you run "udpipe --version"?

chrismoor commented 7 years ago

Program received signal SIGILL whatever arguments I give. It seems to be a linker issue, because this behavior appear when LTO is enabled and with statically-linked libc++ (-static-libstdc++) although I don't understand why...


Another thing: I executed your merge_sources program and I got udpipe.cpp (good so far), but I cannot compile the latter because there's a lots of "redefinition" errors (I didn't go to see deeper). I mention it just in case, because I use the library and it works well !

foxik commented 7 years ago

The SIGILL should mean the binary contains an illegal instruction -- maybe the linker and statically-linked libc brings some precompiled code which uses some instruction which you processor does not have? But that is not very probable; more likely the binary is somehow corrupt... Did you try some older compiler, maybe http://tdm-gcc.tdragon.net/download ? BTW, we are using Visual Studio to compile the windows binary (it produces smaller executables and it makes us write more multi-platform programs).

As for the udpipe.cpp -- I use that regularly, it is even tested automatically on Travis (and should work with g++ 6 too). It should be used as a stand-alone UDPipe source, see for example the tests/udpipe_bundle.cpp

chrismoor commented 7 years ago

No, I didn't try with an other compiler, but I'll do.

Precisely, I tried to compile tests/udpipe_bundle.cpp, but that doesn't work because it need src_lib_only/udpipe.cpp which doesn't compile... The problem comes from udpipe.cpp I guess (and thus from merge_sources.cpp which build udpipe.cpp). Or maybe this problem comes from my compiler, does it ?

foxik commented 7 years ago

As for src_lib_only/udpipe.cpp -- ah, I see an issue with directory separators (the same file could get multiple times to the udpipe.cpp with different directory separators) -- just pushed a fixed

chrismoor commented 7 years ago

That fixes the issue !


By curiosity I tried with Visual C++ 2015 tools, but I get some errors (even in "normal" mode) make exe BITS=32 PLATFORM=win-vs

It's definitely not bad for me, as it works fine now with gcc ;)

foxik commented 7 years ago

Oh, the VC++ 2015 error surprises me -- I just recompiled it here (both 32bit and 64bit) and it works without an error. We are using Visual Studio 2015 community (and to be honest we compile under Wine; but that should hardly make a compiler error ho away :-)

The error message also does not seem very helpful. The error points to the else in the current snippet:

  for (auto&& embedding_ids : embedding_ids_sequences)
    for (unsigned i = 0; i < embeddings.size(); i++)
      if (embedding_ids && (*embedding_ids)[i] >= 0) {
        const float* embedding = embeddings[i].weight((*embedding_ids)[i]);
        for (unsigned dimension = embeddings[i].dimension; dimension; dimension--, embedding++, index++)
          if (w.input_dropout.empty() || !w.input_dropout[index])
            for (auto&& j : w.hidden_kept)
              w.hidden_layer[j] += *embedding * network.weights[0][index][j];
      } else {
        index += embeddings[i].dimension;
      }

But the code seems fine and the else is following the obvious if -- in case you can see anything wrong with the code, please write me.

chrismoor commented 7 years ago

OK noted.

Off topic What's the meaning of message "Should encode value 65543 in one byte!" during tagger training ? I got this error when the german UD corpus was starting to train.

foxik commented 7 years ago

That is an error of one of the components -- MorphoDiTa -- which happens when there are too many entries in the morphological dictionary. Usually lowering guesser_enrich_dictionary helps.

chrismoor commented 7 years ago

Thank you! That solved my problem.

foxik commented 7 years ago

The limit on the size of the dictionary has been alleviated in UDPipe 1.1. Also the Visual C++ 2015 Update 3 compilation error (which is really a VC++ mistake) has been fixed in UDPipe 1.1.