speechbrain / speechbrain

A PyTorch-based Speech Toolkit
http://speechbrain.github.io
Apache License 2.0
8.94k stars 1.4k forks source link

[Bug]: Transducer beamsearch decoder can never finish #1969

Open Pupiera opened 1 year ago

Pupiera commented 1 year ago

Describe the bug

I was looking into customizing the transducer decoder beamsearch to support multitask as in this paper when i realized that in the case of degenerate output from the beamsearch, it may never finish. This behavior can also be seen in the doctest of the decoder/transducer.py file. (If you run it without training, it is stuck in an infinite loop) This may be because the implementation in speechbrain is slightly different from the one in the original paper by Alex Graves.

Screenshot from 2023-05-03 13-08-09

In the original paper, y*+blank will always be added to B. Whereas in the speechbrain implementation, it will only be added if the blank token is in the topK probabilities. Meaning that if an output degenerate on a bigram for exemple, the blank index may never be in topK probabilities, the beam in A will never be added to B, and it will never finish. Screenshot from 2023-05-03 13-08-32 Maybe adding a safeguard against degenerate sentence can be done ? Based on the number of token generated in one step, for example.

There are ways to bypass this kind of problem simply by training the model more or maybe by using a bigger beam, but I figured I had let you know of this behavior since it can be significantly different from the expected one.

Expected behaviour

The beamsearch should end even if it returns a degenerate output.

To Reproduce

No response

Versions

No response

Relevant log output

No response

Additional context

No response

Adel-Moumen commented 1 year ago

hi @Pupiera, thanks for reporting this issue.

I haven't had the chance to review our Transducer Beam Search yet. However, I believe that we should always have a strict matching between our implementations and the related official papers. If you suspect that there may be an error in our implementation compared to the official paper, then we should take immediate steps to address it.

Would you like to open a PR to fix it please? it would be a great way to contribute to SpeechBrain... and to credits the authors ideas as well. :-)

BTW, we are currently working on improving Beam Search decoding in SpeechBrain, we have many things that will come soon (hopefully!). And we are looking for people to help and work closely (freely unfortunately) with us on improving Transducer Beam Search in SpeechBrain. If you want, we can find ways to integrate your current work/ or other Transducer Beam Search in SpeechBrain.

Thanks again!

Pupiera commented 1 year ago

I am writing a fix, I will test to see if I have significant performance change and do a PR then if my result are similar to the previous one. Is there any "nice" way to unit test beamsearch ? I can't seem to find the unit test for the already implemented one (if there is some ?)

Adel-Moumen commented 1 year ago

Nop but feel free to add a new test. always great to test code :-)

Pupiera commented 1 year ago

Small erratum, the implemented one, is in fact this one from this paper : Screenshot from 2023-05-04 10-30-34 This does not change much since my point was on the line "add y* to B" which is the same in the two paper. I ran a quick experiment overnight and observed a huge hit to the performance. I think it may be linked to the while break condition (B contains less than W elements more probable than the most probable in A). In the current implementation, here is how it is done : if len(beam_hyps) >= self.beam_size: break which clearly is not the same than the paper (and maybe cause early exiting ?). So i'm fixing this too, and rerunning my test.

Adel-Moumen commented 1 year ago

The link of the paper is not working. Could you please give the exact reference?

Pupiera commented 1 year ago

https://arxiv.org/pdf/1911.01629.pdf updated the link in my post

Adel-Moumen commented 1 year ago

Ok, ty.

I think we will have to support the « official » first beam search of the Transducer (the one by Alex Graves). Doesn't make sense to not do it.

Adel-Moumen commented 1 year ago

Hey, any news about this issue please ? Thanks :)

Pupiera commented 1 year ago

Hey, sorry, I was busy with a lot of stuff. I wrote something working for me. It has slightly worse performance on my corpus (it's a very noisy one), but is around 1/3 faster (30 minutes decoding vs 45 for the currently implemented one). I am opening a PR this afternoon. I did not remove the first implementation, to not break any backward compatibility.

Adel-Moumen commented 1 year ago

Ok cool! We will try your PR on our side :-)

Pupiera commented 1 year ago

Here is the pull request : https://github.com/speechbrain/speechbrain/pull/1978