kensho-technologies / pyctcdecode

A fast and lightweight python-based CTC beam search decoder for speech recognition.
Apache License 2.0
416 stars 89 forks source link

Is there any literature or reference about this implementation? #16

Closed lyjzsyzlt closed 2 years ago

lyjzsyzlt commented 3 years ago

The code you contributed does not seem to be ctc prefix beam search algorithm. Is there any literature or reference about this shallow fusion implementation?

gkucsko commented 3 years ago

Hi, thanks for asking. The implementation is pretty similar to the standard method of ctc shallow fusion with language model used by paddlepaddle, deepspeech, Facebook etc. There are a few modifications and a bunch of additions of things that we found to give good results as well as new features. We haven’t written up anything formal, but if there is interest from the community it’s definitely something we could consider. In the meantime, is it working for you or have you run into any issues?

lyjzsyzlt commented 3 years ago

Thanks for your response! I currently apply it to the decoding of Chinese speech recognition, by modifying part of the code. This implementation is much faster than the standard ctc prefix beam search decoding, but currently there is no effect after I add a LM. There may be a problem with my code modification. I need to learn this implementation further. This is a very good work! I hope that there will be more detailed explanations of the theory and implementation in the future, so that novices can learn.

poneill commented 3 years ago

Thanks for your kind words. When you say you see no effect from adding an LM, do you mean A) "the results are not meaningfully different" or B) "the results are exactly identical"? We may not be able to help with A, but B would be surprising. It would be helpful if you could post a minimum working example?

lyjzsyzlt commented 2 years ago

The two results are exactly identical. I only changed the vocabulary from English alphabet to Chinese character, and is_bpe is set to false. There is no effects after I changed the LM weight. One ctc logits matrix, vocabulary and a 3-gram LM will be sent to your mailbox.

gkucsko commented 2 years ago

Thanks yes we haven’t tested with Chinese yet, but would love to investigate a little what the best way would be to do this. (Similar to byte mode in deepspeech decoder) Sending over some files would help thanks!

lyjzsyzlt commented 2 years ago

I have sent some files to your @.***).

------------------ 原始邮件 ------------------ 发件人: "Georg @.>; 发送时间: 2021年7月11日(星期天) 晚上11:01 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [kensho-technologies/pyctcdecode] Is there any literature or reference about this implementation? (#16)

Thanks yes we haven’t tested with Chinese yet, but would love to investigate a little what the best way would be to do this. (Similar to byte mode in deepspeech decoder) Sending over some files would help thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

gkucsko commented 2 years ago

Hmm, haven’t received anything yet. Maybe try my first name ‘georg’ at kensho.com

lyjzsyzlt commented 2 years ago

You means mailbox @.***`?

------------------ 原始邮件 ------------------ 发件人: "Georg @.>; 发送时间: 2021年7月12日(星期一) 凌晨0:51 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [kensho-technologies/pyctcdecode] Is there any literature or reference about this implementation? (#16)

Hmm, haven’t received anything yet. Maybe try my first name ‘georg’ at kensho.com

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

gkucsko commented 2 years ago

yes

ivangtorre commented 2 years ago

I am extensively testing this library in speech recognition for some paper implementations and I am very impressed by the performance. First of all congratulations! My previous implementation was based on parlance-ctc and this seems to be very superior. Could you provide more technical details?

Adel-Moumen commented 1 year ago

Hi, thanks for asking. The implementation is pretty similar to the standard method of ctc shallow fusion with language model used by paddlepaddle, deepspeech, Facebook etc. There are a few modifications and a bunch of additions of things that we found to give good results as well as new features. We haven’t written up anything formal, but if there is interest from the community it’s definitely something we could consider. In the meantime, is it working for you or have you run into any issues?

Hi @gkucsko, the pyctcdecode implementation differs from the others because you are not computing the blank/non blank probabilities. Could you please explain why you choose to not do that? For instance, in the Word Beam Search paper (see: https://repositum.tuwien.at/retrieve/1835) they are computing the blank/non blank probabilities. This is also the same for Prefix Beam Search (see: https://arxiv.org/pdf/1408.2873.pdf).

Many thanks!