gooofy / zamia-speech

Open tools and data for cloudless automatic speech recognition
GNU Lesser General Public License v3.0
443 stars 86 forks source link

Question: Decoding with Zamia Speech's German wav2letter model using wav2letter Decoder executable #104

Open realbaker1967 opened 4 years ago

realbaker1967 commented 4 years ago

First of all, really nice work!

I am interested in your German acoustic model for a benchmark.

I assume based on here you suggest to use wav2letter's Decoder executable to decode audios with your German acoustic model.

If we would use that executable from wav2letter, we would need to tune certain set of parameters which they mention in their Decoder executable explanation.

If possible, could you please share with us your tuned parameters for decoding?

Or, do we need to use the parameters in w2l_run_decode.sh.template?

Regards

gooofy commented 4 years ago

check out

data/src/speech/w2l_run_decode.sh.template

the script I used to run the decoder was based on this template. Please be aware that it is quite likely w2l has moved on from the state it was in back when I trained that model and used it, command line and/or file formats may have changed since then.

realbaker1967 commented 4 years ago

Thank you for your fast answer!

Do you remember which commit of wav2letter you used when you trained and tested the model?

One more thing, do you remember which language model you used? I am assuming you used larger model of order 6 with less pruning.

I am trying to test your model with my audio files with the exact configuration you used to achieve your reported result in here.

lagidigu commented 4 years ago

@realbaker1967 I used the same decoder configuration as in the template file, as well as a the order 6 LM. Unfortunately, I cannot reproduce the reported WER of 3.97%.

Its probably due to the update of w2l I gues...

realbaker1967 commented 4 years ago

In my case, the model was decoding good except the beginnings and the endings of the audio files. For example:

Annotation: Sie pflegten die Kranken und verbanden die Verwundeten. Hypothesis: pflegt eine kranken und verwandten die verwundeten en

Note that the word Sie is omitted and en is added.

I observe these two problems very frequently, especially adding non existing words at the ends.

Did you observe similar problems @lagidigu ?

lagidigu commented 4 years ago

@realbaker1967 I get the same results after applying the template. This is strange. I will have to look into how the beam search decoder works exactly and will report back whether I made any progress.

lagidigu commented 4 years ago

@realbaker1967 unfortunately I couldn't troubleshoot the issue. @gooofy do you know what might have changed with the decoder? The WER is a lot higher than 3.97%, unfortunately :/

gooofy commented 4 years ago

@lagidigu no idea what exactly has changed but as I mentioned earlier I am not surprised wav2letter has moved on from the state it was in when I made my experiments. Actually, I suspect it is good news wav2letter continues to be developed and improved.

If you're serious about wav2letter I would suggest you train your own model from scratch using their current codebase - all training material from zamia speech is freely available as are the scripts used to train the model so that should give you a head start.

realbaker1967 commented 4 years ago

@gooofy I am interested to do only a benchmark. So no need to train from scratch.

For that, it would be really good if you could provide us which commit of wav2letter you used, if it would be possible of course.

In that case, i could safely run the decoder with your given template.

Thanks