maetshju / flux-ctc-grad

Test CTC functionality in Flux.jl
MIT License
1 stars 0 forks source link

running with public dataset #2

Open matthijsvk opened 3 years ago

matthijsvk commented 3 years ago

Hi, I saw you contributed the CTC loss function in [https://github.com/FluxML/Flux.jl/pull/1287]. Thanks for all that work :). There you mentioned you had an example with a publicly available speech corpus. Which one was that and would you be willing to upload the code for that?

maetshju commented 3 years ago

It was a subset of the Massive Auditory Lexical Decision database, which the lab I work in released in 2019. The full data set is over 3 hours, I think, of isolated English words recorded by a single young male speaker, in addition to nearly 10,000 recorded fake English words. I had switched away from it to TIMIT during testing because I didn't have an a priori idea of what the accuracy level should be to determine if the CTC loss function was working correctly.

The code is actually in this repo already here. The 00-data.jl script should download and extract the subset I used, which was 10,000 (about a third) of the real English words. The MFCCs have already been extracted, but the labels are given in onehot matrix form because I had originally used this subset for cross entropy. The model code in 01-model.jl is a bit oudated and expects the onehot matrix, while the final version of the CTC loss function committed expects a onecold vector.

If you are interested in the full data set, it is available here. The transcriptions are given as TextGrid files to use with the Praat program. If you don't already have a library to process the files to extract the transcriptions, you may want to use the textgrid(https://pypi.org/project/TextGrid/) Python library.

At some point, I may try to update the code and possibly submit it to the Flux model zoo, but our semester started recently, so I'm low on spare time for a while. Let me know if you have any questions though!

maetshju commented 3 years ago

Oh, the actual model file is missing. Well, let me see if I can track that down. I will see if I can update it now anyway.

maetshju commented 3 years ago

@matthijsvk I have been able to re-create the code I was using for these demos and put it here. The data set is a bit funky for CTC because of it being onehot encoded. I am planning to make something for the model zoo, where I will re-extract the input and output values to be more appropriate to a CTC-style recognition system. Hopefully the stuff in this repo is still somewhat helpful until then.

matthijsvk commented 3 years ago

Great, thanks! I think it would be awesome to have an easy-to-reproduce example of real-life application like speech recognition with RNNs in Julia. The model zoo contains mostly toy examples so far.