parlance / ctcdecode

PyTorch CTC Decoder bindings
MIT License
823 stars 240 forks source link

RAM leak with KenLM #111

Open adamnsandle opened 5 years ago

adamnsandle commented 5 years ago

Hello! For some reason our 3 GB russian KenLM arpa model (binarized) uses ~50 GB of RAM during CTCBeamDecoder class inizialization and estimation (100 beam width). When using KenLM python module with this model everything is ok! Model was trained on a big Russian corpus (37 labels).

snakers4 commented 5 years ago

@SeanNaren would be really great if you could help out with this

@adamnsandle Which tokens do you use, how much data do use to train the model, how do you train the KenLM model, how do you initialize the class?

adamnsandle commented 5 years ago

What we tried to do:

Overall, RAM consumption drops with lesser numer of sentences / their length, but stays too huge, ~20 GB with a 100 MB model

Labels used - 'АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ2_ * ' (simple russian alphabet, 2 is a special symbol for a letter repeat, * as a string end symbol)

ctc class inizialization:

dcdr = CTCBeamDecoder(labels, lm_path, alpha=0.3, beta=0.4, cutoff_top_n=20, cutoff_prob=1, beam_width=100, num_processes=6, labels.index('_'))

(tried different num_processes and beam_width, did not work)

SeanNaren commented 5 years ago

Did you guys try turning this arpa file into a trie file? check the examples here

nvm just saw the above comment. You should always build a binary realistically, using the raw ARPA probably is overkill.

Also you should definitely check pruning specific ngram when making the LM if you guys haven't done so already. This can be seen here

snakers4 commented 5 years ago

@adamnsandle Also which kenlm command did you use to train an LM?

@SeanNaren Which command do you use for your models?

adamnsandle commented 5 years ago

@SeanNaren We used this script to train model: bin/lmplz -o 4 -S 50% -T temp/ --prune 0 30 60 130 --discount_fallback <web_all_norm.txt> web_all_norm.arpa

And to binarize it: ./build_binary -S 5G trie web_all_norm.arpa web_all_norm.arpa.bin or ./build_binary -S 5G trie -q 8 web_all_norm.arpa web_all_norm.arpa.bin or simply ./build_binary web_all_norm.arpa web_all_norm.arpa.bin

SeanNaren commented 5 years ago

How big is the output trie?

adamnsandle commented 5 years ago

2.05 GB

When we try to use lesser model (~200 MB in trie), RAM leak still presents (~30GB)

CXiaoDing commented 5 years ago

Hi, I want to know that what does the KenLM model based? Word-based or character-based??? Thank you very much!@adamnsandle @SeanNaren

SwapnilDreams100 commented 4 years ago

+1 this issue. Even using the deep speech 1 lm binary causes massive ram use

buriy commented 4 years ago

I believe this is caused by an internal trie creation on model loading, which then stays in memory and consumes a lot of RAM. Mozilla in their version saves this trie to an external file, and doesn't generate it "on the fly".

baicaitongee commented 4 years ago

Loading model... Traceback (most recent call last): File "examples/demo-server.py", line 10, in import beamdecode File "/home/pi/masr/examples/../beamdecode.py", line 29, in blank_index, File "/home/pi/.local/lib/python3.7/site-packages/ctcdecode/init.py", line 18, in init self._num_labels) RuntimeError: third_party/kenlm/util/mmap.cc:122 in void util::MapOrThrow(std::size_t, bool, int, bool, int, uint64_t) threw ErrnoException because `(ret = mmap(__null, size, protect, flags, fd, offset)) == ((void ) -1)'. Cannot allocate memory mmap failed for size 2953349384 at offset 0

baicaitongee commented 4 years ago

i have the simillar problems on isuue #137

jonatasgrosman commented 3 years ago

+1 this issue.

tobiolatunji commented 2 years ago

+1. Is there a fix?