kpu / kenlm

KenLM: Faster and Smaller Language Model Queries
http://kheafield.com/code/kenlm/
Other
2.51k stars 511 forks source link

compile for higher order LMs #140

Closed mohamad-hasan-sohan-ajini closed 6 years ago

mohamad-hasan-sohan-ajini commented 6 years ago

Hi I need to compile kenlm to train a 10-gram language model. As recommended in readme, I add the following ifndef statement to kenlm/utils/have.hh:

#ifndef KENLM_MAX_ORDER
int KENLM_MAX_ORDER = 10;
#endif

but I get the following error in compilation:

Scanning dependencies of target kenlm_util [ 1%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/bignum-dtoa.cc.o [ 1%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/bignum.cc.o [ 2%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/cached-powers.cc.o [ 3%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/diy-fp.cc.o [ 4%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/double-conversion.cc.o [ 5%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/fast-dtoa.cc.o [ 6%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/fixed-dtoa.cc.o [ 6%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/strtod.cc.o [ 7%] Building CXX object util/CMakeFiles/kenlm_util.dir/stream/chain.cc.o [ 8%] Building CXX object util/CMakeFiles/kenlm_util.dir/stream/count_records.cc.o [ 9%] Building CXX object util/CMakeFiles/kenlm_util.dir/stream/io.cc.o [ 10%] Building CXX object util/CMakeFiles/kenlm_util.dir/stream/line_input.cc.o [ 10%] Building CXX object util/CMakeFiles/kenlm_util.dir/stream/multi_progress.cc.o [ 11%] Building CXX object util/CMakeFiles/kenlm_util.dir/stream/rewindable_stream.cc.o [ 12%] Building CXX object util/CMakeFiles/kenlm_util.dir/bit_packing.cc.o [ 13%] Building CXX object util/CMakeFiles/kenlm_util.dir/ersatz_progress.cc.o [ 14%] Building CXX object util/CMakeFiles/kenlm_util.dir/exception.cc.o [ 14%] Building CXX object util/CMakeFiles/kenlm_util.dir/file.cc.o [ 15%] Building CXX object util/CMakeFiles/kenlm_util.dir/file_piece.cc.o [ 16%] Building CXX object util/CMakeFiles/kenlm_util.dir/float_to_string.cc.o [ 17%] Building CXX object util/CMakeFiles/kenlm_util.dir/integer_to_string.cc.o [ 18%] Building CXX object util/CMakeFiles/kenlm_util.dir/mmap.cc.o [ 18%] Building CXX object util/CMakeFiles/kenlm_util.dir/murmur_hash.cc.o [ 19%] Building CXX object util/CMakeFiles/kenlm_util.dir/parallel_read.cc.o [ 20%] Building CXX object util/CMakeFiles/kenlm_util.dir/pool.cc.o [ 21%] Building CXX object util/CMakeFiles/kenlm_util.dir/read_compressed.cc.o [ 22%] Building CXX object util/CMakeFiles/kenlm_util.dir/scoped.cc.o [ 22%] Building CXX object util/CMakeFiles/kenlm_util.dir/spaces.cc.o [ 23%] Building CXX object util/CMakeFiles/kenlm_util.dir/string_piece.cc.o [ 24%] Building CXX object util/CMakeFiles/kenlm_util.dir/usage.cc.o [ 25%] Linking CXX static library ../lib/libkenlm_util.a [ 25%] Built target kenlm_util Scanning dependencies of target string_stream_test [ 26%] Building CXX object util/CMakeFiles/string_stream_test.dir/string_stream_test.cc.o [ 27%] Linking CXX executable ../tests/string_stream_test [ 27%] Built target string_stream_test Scanning dependencies of target tokenize_piece_test [ 28%] Building CXX object util/CMakeFiles/tokenize_piece_test.dir/tokenize_piece_test.cc.o [ 29%] Linking CXX executable ../tests/tokenize_piece_test ../lib/libkenlm_util.a(exception.cc.o):(.data+0x0): multiple definition of `KENLM_MAX_ORDER' CMakeFiles/tokenize_piece_test.dir/tokenize_piece_test.cc.o:(.data+0x38): first defined here collect2: error: ld returned 1 exit status make[2]: [tests/tokenize_piece_test] Error 1 make[1]: [util/CMakeFiles/tokenize_piece_test.dir/all] Error 2 make: *** [all] Error 2

Any Idea how to solve the error?

mohamad-hasan-sohan-ajini commented 6 years ago

compilation issue solved by:

#ifndef KENLM_MAX_ORDER
#define KENLM_MAX_ORDER = 10
#endif

but I still get the error:

/home/sobhe/kenlm/lm/model.cc:49 in void lm::ngram::detail::{anonymous}::CheckCounts(const std::vector&) threw FormatLoadException because counts.size() > 6'. This model has order 10 but KenLM was compiled to support up to 6. If your build system supports changing KENLM_MAX_ORDER, change it there and recompile. In the KenLM tarball or Moses, use e.g.bjam --max-kenlm-order=6 -a'. Otherwise, edit lm/max_order.hh. Byte: 146 ERROR

mohamad-hasan-sohan-ajini commented 6 years ago

ouchhhh

The KENLM_MAX_ORDER is set to 6 in lm/CMakeLists.txt. Setting it to 10 and compiling solved the issue.

mohamad-hasan-sohan-ajini commented 6 years ago

The problem persist when I want to load 10 gram model in python. Any suggestion?

Traceback (most recent call last): File "chars/test.py", line 32, in model = AzBarChars('resources/chars.klm') File "/home/sobhe/azbar/chars/language.py", line 10, in init self.model = kenlm.LanguageModel(klm_file) File "kenlm.pyx", line 117, in kenlm.Model.init (python/kenlm.cpp:2656) IOError: Cannot read model 'resources/chars.klm' (lm/model.cc:49 in void lm::ngram::detail::{anonymous}::CheckCounts(const std::vector&) threw FormatLoadException because counts.size() > 6'. This model has order 10 but KenLM was compiled to support up to 6. If your build system supports changing KENLM_MAX_ORDER, change it there and recompile. In the KenLM tarball or Moses, use e.g.bjam --max-kenlm-order=6 -a'. Otherwise, edit lm/max_order.hh.)

mohamad-hasan-sohan-ajini commented 6 years ago

The problem persist when I want to load 10 gram model in python. Any suggestion?

Traceback (most recent call last): File "chars/test.py", line 32, in model = AzBarChars('resources/chars.klm') File "/home/sobhe/azbar/chars/language.py", line 10, in init self.model = kenlm.LanguageModel(klm_file) File "kenlm.pyx", line 117, in kenlm.Model.init (python/kenlm.cpp:2656) IOError: Cannot read model 'resources/chars.klm' (lm/model.cc:49 in void lm::ngram::detail::{anonymous}::CheckCounts(const std::vector&) threw FormatLoadException because counts.size() > 6'. This model has order 10 but KenLM was compiled to support up to 6. If your build system supports changing KENLM_MAX_ORDER, change it there and recompile. In the KenLM tarball or Moses, use e.g.bjam --max-kenlm-order=6 -a'. Otherwise, edit lm/max_order.hh.)

kpu commented 6 years ago

In the case of the python module, I presume you compiled it with setup.py. Edit line 21 of setup.py and reinstall.

yz1019117968 commented 4 years ago

Hi, Sorry, I got this error today, when I was converting arpa file to bin file for gram=7, but after I tried your methods, it still cannot work:

/home/mark/dependency/kenlm/lm/model.cc:49 in void lm::ngram::detail::{anonymous}::CheckCounts(const std::vector<long unsigned int>&) threw FormatLoadException because counts.size() > 6.This model has order 7 but KenLM was compiled to support up to 6. If your build system supports changing KENLM_MAX_ORDER, change it there and recompile. With cmake: cmake -DKENLM_MAX_ORDER=10 .. With Moses: bjam --max-kenlm-order=10 -a Otherwise, edit lm/max_order.hh. Byte: 219 ERROR I also tried cmake -DKENLM_MAX_ORDER=10 .. under kenlm/build folder, get the following response: -- Boost version: 1.65.1 -- Found the following Boost libraries: -- program_options -- system -- thread -- unit_test_framework -- chrono -- date_time -- atomic -- Configuring done -- Generating done -- Build files have been written to: /home/mark/dependency/kenlm/build Then I tried to conversion again from arpa to bin, it still responsed the same error like the first one. Do you know how to figure it out? Thank you very much!