compile for higher order LMs

mohamad-hasan-sohan-ajini commented 6 years ago

Hi I need to compile kenlm to train a 10-gram language model. As recommended in readme, I add the following ifndef statement to kenlm/utils/have.hh:

#ifndef KENLM_MAX_ORDER
int KENLM_MAX_ORDER = 10;
#endif

but I get the following error in compilation:

Scanning dependencies of target kenlm_util [ 1%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/bignum-dtoa.cc.o [ 1%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/bignum.cc.o [ 2%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/cached-powers.cc.o [ 3%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/diy-fp.cc.o [ 4%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/double-conversion.cc.o [ 5%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/fast-dtoa.cc.o [ 6%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/fixed-dtoa.cc.o [ 6%] Building CXX object util/CMakeFiles/kenlm_util.dir/double-conversion/strtod.cc.o [ 7%] Building CXX object util/CMakeFiles/kenlm_util.dir/stream/chain.cc.o [ 8%] Building CXX object util/CMakeFiles/kenlm_util.dir/stream/count_records.cc.o [ 9%] Building CXX object util/CMakeFiles/kenlm_util.dir/stream/io.cc.o [ 10%] Building CXX object util/CMakeFiles/kenlm_util.dir/stream/line_input.cc.o [ 10%] Building CXX object util/CMakeFiles/kenlm_util.dir/stream/multi_progress.cc.o [ 11%] Building CXX object util/CMakeFiles/kenlm_util.dir/stream/rewindable_stream.cc.o [ 12%] Building CXX object util/CMakeFiles/kenlm_util.dir/bit_packing.cc.o [ 13%] Building CXX object util/CMakeFiles/kenlm_util.dir/ersatz_progress.cc.o [ 14%] Building CXX object util/CMakeFiles/kenlm_util.dir/exception.cc.o [ 14%] Building CXX object util/CMakeFiles/kenlm_util.dir/file.cc.o [ 15%] Building CXX object util/CMakeFiles/kenlm_util.dir/file_piece.cc.o [ 16%] Building CXX object util/CMakeFiles/kenlm_util.dir/float_to_string.cc.o [ 17%] Building CXX object util/CMakeFiles/kenlm_util.dir/integer_to_string.cc.o [ 18%] Building CXX object util/CMakeFiles/kenlm_util.dir/mmap.cc.o [ 18%] Building CXX object util/CMakeFiles/kenlm_util.dir/murmur_hash.cc.o [ 19%] Building CXX object util/CMakeFiles/kenlm_util.dir/parallel_read.cc.o [ 20%] Building CXX object util/CMakeFiles/kenlm_util.dir/pool.cc.o [ 21%] Building CXX object util/CMakeFiles/kenlm_util.dir/read_compressed.cc.o [ 22%] Building CXX object util/CMakeFiles/kenlm_util.dir/scoped.cc.o [ 22%] Building CXX object util/CMakeFiles/kenlm_util.dir/spaces.cc.o [ 23%] Building CXX object util/CMakeFiles/kenlm_util.dir/string_piece.cc.o [ 24%] Building CXX object util/CMakeFiles/kenlm_util.dir/usage.cc.o [ 25%] Linking CXX static library ../lib/libkenlm_util.a [ 25%] Built target kenlm_util Scanning dependencies of target string_stream_test [ 26%] Building CXX object util/CMakeFiles/string_stream_test.dir/string_stream_test.cc.o [ 27%] Linking CXX executable ../tests/string_stream_test [ 27%] Built target string_stream_test Scanning dependencies of target tokenize_piece_test [ 28%] Building CXX object util/CMakeFiles/tokenize_piece_test.dir/tokenize_piece_test.cc.o [ 29%] Linking CXX executable ../tests/tokenize_piece_test ../lib/libkenlm_util.a(exception.cc.o):(.data+0x0): multiple definition of `KENLM_MAX_ORDER' CMakeFiles/tokenize_piece_test.dir/tokenize_piece_test.cc.o:(.data+0x38): first defined here collect2: error: ld returned 1 exit status make[2]: [tests/tokenize_piece_test] Error 1 make[1]: [util/CMakeFiles/tokenize_piece_test.dir/all] Error 2 make: *** [all] Error 2

Any Idea how to solve the error?

mohamad-hasan-sohan-ajini commented 6 years ago

compilation issue solved by:

#ifndef KENLM_MAX_ORDER
#define KENLM_MAX_ORDER = 10
#endif

but I still get the error:

/home/sobhe/kenlm/lm/model.cc:49 in void lm::ngram::detail::{anonymous}::CheckCounts(const std::vector&) threw FormatLoadException because counts.size() > 6'. This model has order 10 but KenLM was compiled to support up to 6. If your build system supports changing KENLM_MAX_ORDER, change it there and recompile. In the KenLM tarball or Moses, use e.g.bjam --max-kenlm-order=6 -a'. Otherwise, edit lm/max_order.hh. Byte: 146 ERROR

mohamad-hasan-sohan-ajini commented 6 years ago

ouchhhh

The KENLM_MAX_ORDER is set to 6 in lm/CMakeLists.txt. Setting it to 10 and compiling solved the issue.

mohamad-hasan-sohan-ajini commented 6 years ago

The problem persist when I want to load 10 gram model in python. Any suggestion?

Traceback (most recent call last): File "chars/test.py", line 32, in model = AzBarChars('resources/chars.klm') File "/home/sobhe/azbar/chars/language.py", line 10, in init self.model = kenlm.LanguageModel(klm_file) File "kenlm.pyx", line 117, in kenlm.Model.init (python/kenlm.cpp:2656) IOError: Cannot read model 'resources/chars.klm' (lm/model.cc:49 in void lm::ngram::detail::{anonymous}::CheckCounts(const std::vector&) threw FormatLoadException because counts.size() > 6'. This model has order 10 but KenLM was compiled to support up to 6. If your build system supports changing KENLM_MAX_ORDER, change it there and recompile. In the KenLM tarball or Moses, use e.g.bjam --max-kenlm-order=6 -a'. Otherwise, edit lm/max_order.hh.)

mohamad-hasan-sohan-ajini commented 6 years ago

The problem persist when I want to load 10 gram model in python. Any suggestion?

Traceback (most recent call last): File "chars/test.py", line 32, in model = AzBarChars('resources/chars.klm') File "/home/sobhe/azbar/chars/language.py", line 10, in init self.model = kenlm.LanguageModel(klm_file) File "kenlm.pyx", line 117, in kenlm.Model.init (python/kenlm.cpp:2656) IOError: Cannot read model 'resources/chars.klm' (lm/model.cc:49 in void lm::ngram::detail::{anonymous}::CheckCounts(const std::vector&) threw FormatLoadException because counts.size() > 6'. This model has order 10 but KenLM was compiled to support up to 6. If your build system supports changing KENLM_MAX_ORDER, change it there and recompile. In the KenLM tarball or Moses, use e.g.bjam --max-kenlm-order=6 -a'. Otherwise, edit lm/max_order.hh.)

kpu commented 6 years ago

In the case of the python module, I presume you compiled it with setup.py. Edit line 21 of setup.py and reinstall.

yz1019117968 commented 4 years ago

Hi, Sorry, I got this error today, when I was converting arpa file to bin file for gram=7, but after I tried your methods, it still cannot work:

/home/mark/dependency/kenlm/lm/model.cc:49 in void lm::ngram::detail::{anonymous}::CheckCounts(const std::vector<long unsigned int>&) threw FormatLoadException because counts.size() > 6.This model has order 7 but KenLM was compiled to support up to 6. If your build system supports changing KENLM_MAX_ORDER, change it there and recompile. With cmake: cmake -DKENLM_MAX_ORDER=10 .. With Moses: bjam --max-kenlm-order=10 -a Otherwise, edit lm/max_order.hh. Byte: 219 ERROR I also tried cmake -DKENLM_MAX_ORDER=10 .. under kenlm/build folder, get the following response: -- Boost version: 1.65.1 -- Found the following Boost libraries: -- program_options -- system -- thread -- unit_test_framework -- chrono -- date_time -- atomic -- Configuring done -- Generating done -- Build files have been written to: /home/mark/dependency/kenlm/build Then I tried to conversion again from arpa to bin, it still responsed the same error like the first one. Do you know how to figure it out? Thank you very much!

kpu / kenlm

compile for higher order LMs #140