Open michaelklachko opened 6 years ago
I can put all training .npy files into one directory, but the real problem is that the model would have to fit the largest sample in the whole dataset: if the largest sample is 4000 time steps, then every sample would need to be padded to this size. This would make training extremely slow.
Look at https://github.com/fordDeepDSP/deepSpeech code for a better solution (bucketing sorted inputs).
From L171, I think the logic is to restore saved parameters trained on previous folders? So I guess it's not training 8 separate models if the keep
option is set to True.
if keep == True:
ckpt = tf.train.get_checkpoint_state(savedir)
if ckpt and ckpt.model_checkpoint_path:
model.saver.restore(sess, ckpt.model_checkpoint_path)
print('Model restored from:' + savedir)
I got this repo to work, however it took a lot of effort and many bug fixes. At the end, it's just not worth it - this repo has pretty much been abandoned, and there are better repos available (fordDSP, Mozilla, or SeanNaren for the excellent PyTorch implementation). Also, DeepSpeech is pretty old - there are now better architectures, for example Jasper or transducer based ones). Don't waste your time on this one.
I got this repo to work, however it took a lot of effort and many bug fixes. At the end, it's just not worth it - this repo has pretty much been abandoned, and there are better repos available (fordDSP, Mozilla, or SeanNaren for the excellent PyTorch implementation). Also, DeepSpeech is pretty old - there are now better architectures, for example Jasper or transducer based ones). Don't waste your time on this one.
I kinda agree after trying this repo on LibriSpeech. And thank you for the pointers. I also checked fordDSP and SeanNaren's DeepSpeech2 pytorch implementation, but I still see people having trouble getting reasonable WER/CER there without getting responses. I just want to train on LibriSpeech and might have to try Kaldi now.
LibriSpeech dataset (e.g. train-clean-100) is split into multiple directories during preprocessing. Then during training, the code iterates through these directories: https://github.com/zzw922cn/Automatic_Speech_Recognition/blob/master/speechvalley/main/libri_train.py#L159
The problem is that for each directory, a new model is created according to the maxTimeSteps parameter for the inputs in the directory. This means that if we have 8 directories for train-clean-100 dataset, we are training 8 separate models, which don't share their weights (in fact, every time a model saves a checkpoint, it overwrites the checkpoint saved by the previous model).
This means that we are effectively training only one model out of 8, and we are training it only on data in the last directory (so we are using 1/8 of the dataset).