Reproducing paper results

I'm unable to train a working monolingual embedding model. Using the provided script (train_monolingual_embedding.py) with the top 165 English words yields the following results at the end of training: loss: 0.7145 - accuracy: 0.7711 - val_loss: 7.6774 - val_accuracy: 0.0586

Based on the paper, I was expecting something in the range of 70's for validation accuracy. Is it dependent on choosing the "right" words?

Could you please post a tutorial or maybe some of the missing files (e.g. train_files.txt, val_files.txt, test_files.txt, commands.txt) for reproducing the embedding?

I also notice that the file references seem to be to common voice rather than MSW. I'm using the English clips download from MSW which I'm assuming are the same. I've converted these to 16KHz, 16bit wav files using pydub which I guess is ffmpeg under the hood.

harvard-edge / multilingual_kws

Reproducing paper results #39