Open Pupiera opened 1 year ago
hi @Pupiera, thanks for reporting this issue.
I haven't had the chance to review our Transducer Beam Search yet. However, I believe that we should always have a strict matching between our implementations and the related official papers. If you suspect that there may be an error in our implementation compared to the official paper, then we should take immediate steps to address it.
Would you like to open a PR to fix it please? it would be a great way to contribute to SpeechBrain... and to credits the authors ideas as well. :-)
BTW, we are currently working on improving Beam Search decoding in SpeechBrain, we have many things that will come soon (hopefully!). And we are looking for people to help and work closely (freely unfortunately) with us on improving Transducer Beam Search in SpeechBrain. If you want, we can find ways to integrate your current work/ or other Transducer Beam Search in SpeechBrain.
Thanks again!
I am writing a fix, I will test to see if I have significant performance change and do a PR then if my result are similar to the previous one. Is there any "nice" way to unit test beamsearch ? I can't seem to find the unit test for the already implemented one (if there is some ?)
Nop but feel free to add a new test. always great to test code :-)
Small erratum, the implemented one, is in fact this one from this paper :
This does not change much since my point was on the line "add y* to B" which is the same in the two paper.
I ran a quick experiment overnight and observed a huge hit to the performance.
I think it may be linked to the while break condition (B contains less than W elements more probable
than the most probable in A).
In the current implementation, here is how it is done :
if len(beam_hyps) >= self.beam_size: break
which clearly is not the same than the paper (and maybe cause early exiting ?). So i'm fixing this too, and rerunning my test.
The link of the paper is not working. Could you please give the exact reference?
https://arxiv.org/pdf/1911.01629.pdf updated the link in my post
Ok, ty.
I think we will have to support the « official » first beam search of the Transducer (the one by Alex Graves). Doesn't make sense to not do it.
Hey, any news about this issue please ? Thanks :)
Hey, sorry, I was busy with a lot of stuff. I wrote something working for me. It has slightly worse performance on my corpus (it's a very noisy one), but is around 1/3 faster (30 minutes decoding vs 45 for the currently implemented one). I am opening a PR this afternoon. I did not remove the first implementation, to not break any backward compatibility.
Ok cool! We will try your PR on our side :-)
Here is the pull request : https://github.com/speechbrain/speechbrain/pull/1978
Describe the bug
I was looking into customizing the transducer decoder beamsearch to support multitask as in this paper when i realized that in the case of degenerate output from the beamsearch, it may never finish. This behavior can also be seen in the doctest of the decoder/transducer.py file. (If you run it without training, it is stuck in an infinite loop) This may be because the implementation in speechbrain is slightly different from the one in the original paper by Alex Graves.
In the original paper, y*+blank will always be added to B. Whereas in the speechbrain implementation, it will only be added if the blank token is in the topK probabilities. Meaning that if an output degenerate on a bigram for exemple, the blank index may never be in topK probabilities, the beam in A will never be added to B, and it will never finish. Maybe adding a safeguard against degenerate sentence can be done ? Based on the number of token generated in one step, for example.
There are ways to bypass this kind of problem simply by training the model more or maybe by using a bigger beam, but I figured I had let you know of this behavior since it can be significantly different from the expected one.
Expected behaviour
The beamsearch should end even if it returns a degenerate output.
To Reproduce
No response
Versions
No response
Relevant log output
No response
Additional context
No response