parlance / ctcdecode

PyTorch CTC Decoder bindings
MIT License
830 stars 245 forks source link

segmentation fault when using kenlm language model #128

Open rajeevbaalwan opened 4 years ago

rajeevbaalwan commented 4 years ago

getting below error while running test.py when .arpa lm file of size 2.5GB is used

Loading the LM will be faster if you build a binary file. Reading /home/rajeev/Documents/agent-lm/agent_lm_44_updated.arpa ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100


0%| | 0/2499 [00:00<?, ?it/s] Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

And getting only this while running script with .binary of size 2.5GB kenlm language model file

0%| | 0/2499 [00:00<?, ?it/s] Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)

And when using .binary file of 284 MB it takes 12 sec to decode a single batch which is huge 0%| | 1/2499 [00:12<8:35:51, 12.39s/it]

when i am running script without language model it runs perfectly

Any help on this why segmentation fault occurs and taking so long time in decoding ??

2000ZRL commented 3 years ago

I also met with segmentation fault issue..