parlance / ctcdecode

PyTorch CTC Decoder bindings
MIT License
829 stars 245 forks source link

Support for PyTorch 0.4 #68

Closed miguelvr closed 6 years ago

miguelvr commented 6 years ago

I'm trying to decode using a KenLM language model with pytorch 0.4 and I'm getting a seg fault (core dumped), probably because of the new tensor syntax.

What are the plans for pytorch 0.4 support?

Best, Miguel

ryanleary commented 6 years ago

Can you post the full dump? This will support 0.4.

miguelvr commented 6 years ago

There's no stack trace

This is all I got:

Loading the LM will be faster if you build a binary file.
Reading models/lm_csr_64k_vp_3gram.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
****************************************************************************************************
Segmentation fault (core dumped)

This is how I got the error btw (see issue): https://github.com/SeanNaren/deepspeech.pytorch/pull/294#issuecomment-387379405

SeanNaren commented 6 years ago

Could you try compressing the arpa into a trie file? Something like this:

kenlm/bin build_binary -q 8 -b 8 -a 22 trie models/lm_csr_64k_vp_3gram.arpa models/lm_csr_64k_vp_3gram.trie

and then using the trie file? The error you're getting is from kenlm, not the binding itself.

miguelvr commented 6 years ago

Will try that out (I originally just downloaded the .arpa file)

miguelvr commented 6 years ago

I compiled KenLM but can't get lmutils to work... Am I missing something?

miguelvr commented 6 years ago

FInally managed to do it with: kenlm/build/bin/build_binary -q 8 -b 8 -a 22 trie models/lm_csr_64k_vp_3gram.arpa models/lm_csr_64k_vp_3gram.trie

Anyway, same segmentation fault when transcribing

miguelvr commented 6 years ago

Turns out that because I was running the code in a docker container I wasn't getting the full stack trace. Now that I set the right flags for the container, I'm getting this:

Thread 60 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff655f5700 (LWP 265)]
PathTrie::get_path_trie (this=this@entry=0x7fff655f4be0, new_char=new_char@entry=1, new_timestep=new_timestep@entry=0, reset=reset@entry=true)
    at /tmp/pip-req-build-0lq7p010/ctcdecode/src/path_trie.cpp:56
56  /tmp/pip-req-build-0lq7p010/ctcdecode/src/path_trie.cpp: No such file or directory.
miguelvr commented 6 years ago

@ryanleary the problem is solved. Turns out the binary was faulty. Feel free to close the issue