Open NickleDave opened 2 years ago
Good catch , that's actually a mistake;
wrap_ctc is indeed called before CTCLoss which is itself expecting the log probabilities.
That part of code is also not expected to be different from the base code provided to be filled in.
@NickleDave I'm kind of reopening that issue; I do realize that this lab work is not perfectly working.
At least tested on a the french corpus of CommonVoice, my code fails to overfit a single minibatch and fails to overfit the training set; However training on a larger corpus, It ends producing some recognition that looks like the groundtruth ; There must be still be a bug somewhere. I spent some times looking all along the code, I'm not able to catch anything wrong;
As you were digging deep into the code, did you possibly discovered other issues in the code ? Did it you work when you tried it , maybe on other languages than French ?
Thank you for your insights.
Hi @jeremyfix thank you for letting me know about this.
Long story short, we are extending a previously designed model for annotating birdsong:
https://github.com/yardencsGitHub/tweetynet
But I am in the middle of a big revamp of the framework we use to run experiments:
https://github.com/vocalpy/vak/tree/version-1.0
Would expect to be back running experiments by the end of Feb.
Mainly I was looking at your code since it's one of the only good detailed examples I could find of using the rnn.utils
API for a model that is not pure NLP.
I haven't discovered any other issues but I will definitely tell you if I do.
Hi again @jeremyfix
I noticed in this solution that this comment appears to contain another line of code that maybe should not be commented out?
https://github.com/jeremyfix/deeplearning-lectures/blob/b3862d6dd1af45bea1a99f9b26a0c8baa1520422/LabsSolutions/02-pytorch-asr/main_ctc.py#L42
shouldn't it actually be
so that you transform the "logits" to log softmax?
If there's some reason you're not converting to log softmax on purpose, I'd be curious to know