kpu / kenlm

KenLM: Faster and Smaller Language Model Queries
http://kheafield.com/code/kenlm/
Other
2.46k stars 514 forks source link

Memory error on program closing with C++ example #430

Closed DanBmh closed 1 year ago

DanBmh commented 1 year ago

Hi, I'm currently trying to use kenlm in a C++ program and started out with a minimal example (following the official example), but while the scoring seems to work, it can't be correctly ended. It always fails with:

/kenlm/util/mmap.cc:138 in void util::SyncOrThrow(void*, size_t) threw ErrnoException because `length && msync(start, length, 4)'.
Cannot allocate memory Failed to sync mmapAborted (core dumped)

My program looks as follows:

#include <iostream>
#include <string>
#include "lm/model.hh"

int main()
{
    using namespace lm::ngram;

    Model model("/kenlm/lm/test.arpa");
    std::vector<std::string> words = {"language", "modeling", "is", "fun"};

    State state(model.BeginSentenceState()), out_state;
    const Vocabulary &vocab = model.GetVocabulary();

    for (std::string word : words)
    {
        std::cout << word << " " << model.Score(state, vocab.Index(word), out_state) << '\n';
        state = out_state;
    }

    std::cout << "--finished--" << '\n';
    return 0;
}

I compiled it with:

g++ test_langmodel_minimal.cpp -Wall -DKENLM_MAX_ORDER=5 -I/kenlm/ -L/kenlm/build/lib/ -lkenlm -lkenlm_util -lz -llzma -lbz2 -o test_langmodel_minimal.exe
./test_langmodel_minimal.exe

Full output:

./test_langmodel_minimal.exe
Loading the LM will be faster if you build a binary file.
Reading /kenlm/lm/test.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
language -1.99564
modeling -1.99564
is -1.68787
fun -1.99564
--finished--
/kenlm/util/mmap.cc:138 in void util::SyncOrThrow(void*, size_t) threw ErrnoException because `length && msync(start, length, 4)'.
Cannot allocate memory Failed to sync mmapAborted (core dumped)

Any idea how to fix this?

kpu commented 1 year ago

Platform? I see the .exe extension but also / in the paths. At the same time, this was compiled without defined(_WIN32) || defined(_WIN64) otherwise it would have called FlushViewOfFile

DanBmh commented 1 year ago

Sorry for the confusion, platform is Ubuntu 20.04 in docker, using the latest master branch.

FROM docker.io/ubuntu:20.04

ARG DEBIAN_FRONTEND=noninteractive
ENV LANG C.UTF-8
ENV LC_ALL C.UTF-8

RUN apt-get update && apt-get upgrade -y
RUN apt-get update && apt-get install -y --no-install-recommends wget curl nano git

# Install python
RUN apt-get update && apt-get install -y --no-install-recommends python3 python3-pip python3-dev
RUN pip3 install --upgrade --no-cache-dir pip
RUN python3 -V && pip3 --version

# Install swig
RUN apt-get update && apt-get install -y --no-install-recommends build-essential
RUN apt-get update && apt-get install -y --no-install-recommends swig

# Install kenlm
RUN apt-get update && apt-get install -y --no-install-recommends \
 cmake libboost-system-dev libboost-thread-dev libboost-program-options-dev \
 libboost-test-dev libeigen3-dev zlib1g-dev libbz2-dev liblzma-dev
RUN git clone --depth 1 https://github.com/kpu/kenlm.git
RUN cd /kenlm/; mkdir -p build/
RUN cd /kenlm/build/; cmake ..
RUN cd /kenlm/build/; make -j 4
RUN pip3 install --no-cache https://github.com/kpu/kenlm/archive/master.zip

WORKDIR /
CMD ["/bin/bash"]
kpu commented 1 year ago

I'm confused by this, but if you delete the msync line in question does it work? It's unnecessary when just reading a language model and maybe the kernel is angry about this.
And so we're clear Ubuntu 20.04 on real linux, not WSL?

DanBmh commented 1 year ago

Deleting the line gives me another error:

Loading the LM will be faster if you build a binary file.
Reading /kenlm/lm/test.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
language -1.99564
modeling -1.99564
is -1.68787
fun -1.99564
</s> -1.02949
--finished--
/kenlm/util/mmap.cc:146 in void util::UnmapOrThrow(void*, size_t) threw ErrnoException because `munmap(start, length)'.
Invalid argument munmap failed with 0x0 for length 18446744070488326144Aborted (core dumped)

There seems to be an issue with the length in the first error, it seems quite large...

And deleting this line as well gives me this error (I think this is an expected one):

[...]
--finished--
Could not close file 32573
Aborted (core dumped)


And so we're clear Ubuntu 20.04 on real linux, not WSL?

Yes, host is Ubuntu 22.04

hieuhoang commented 1 year ago

what is the underlying filesystem? kenlm needs to memory map the file which is not be supported by some fs. You may need to move the file to the temp directory before opening it

DanBmh commented 1 year ago

The host has ext4 and docker is using it's default one.

hieuhoang commented 1 year ago

the only other thing i can think of is that the server doesn't have enough mem and doesn't allow mmap to virutal mem https://bobcares.com/blog/mmap-failed-cannot-allocate-memory/

Hieu Hoang https://hieuhoang.github.io/

On Wed, 31 May 2023 at 12:11, DanBmh @.***> wrote:

The host has ext4 and docker is using it's default one.

— Reply to this email directly, view it on GitHub https://github.com/kpu/kenlm/issues/430#issuecomment-1570784356, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFI4FD27NXKE5PF7HJSLZDXI6JWJANCNFSM6AAAAAAYGOI5UY . You are receiving this because you commented.Message ID: @.***>

kpu commented 1 year ago

It's got corrupt values in scoped_mmap. This doesn't seem to be an OS issue. Unless something weird with that is causing the corrupt value. Need to understand how those values came to be there. Stack trace?

kpu commented 1 year ago

And there really shouldn't be a file number 32573. This all sounds like memory corruption.

DanBmh commented 1 year ago

@hieuhoang memory amount shouldn't be the issue, I'm using the example arpa file with a size of 3KB

@kpu can you replicate the issue on your own computer?


Need to understand how those values came to be there. Stack trace?

How do I activate it?

kpu commented 1 year ago

Default KENLM_MAX_ORDER is 6 and you've compiled cmake with the default. But your program is compiled with -DKENLM_MAX_ORDER=5.

DanBmh commented 1 year ago

Thanks for your help, with -DKENLM_MAX_ORDER=6 it's working.

DanBmh commented 1 year ago

Now it's also matching the scores from the python example (which would have been my next question otherwise) , the new scores are:

language -2.41061
modeling -15
is -23.6879
fun -2.29666
</s> -21.0295
--finished--


I'm not an expert in this, but shouldn't the scores be the same for the same arpa file, or is this just a side-effect from the wrong _MAXORDER size?

kpu commented 1 year ago

You had random memory corruption due to the structs not having the same definition in the compiled library and executable. All bets are off.