Closed fanlu closed 6 years ago
This is normal behavior. Kaldi uses MFCC features to train the GMM. The GMM is then used to align the transcription to the speech. Then the DNN is trained to map the fbank features to the alignments. If the DNN training does not converge it probably has some other reason.
I use the aishell task and the default dnn structure.should I change the structure?
To be honnest, I don't really know. If you are using a standard DNN structure, I suggest you try using Kaldi. If you want to then move to more exotic structures, you could first try to reproduce the result using Nabu with a similar architecture before making changes to the model.
Do you still retain the original experimental results?finally I want to make a cldnn structure.But I still want to ensure that my experimental process does not have problems
What do you mean with retain the original experimental results?
kaldi‘s experimental result
Ok and what do you mean with retaining that result? If you use kaldi it is stored in the egs folder you running it from. For Nabu all intermediate steps are stored in the expdir
Yes. I copied the run.sh from egs to Kaldi script train_gmm.sh,And run align_data.sh to use dev set data.and compute_prior.py,and ./run data to create expdir , and ./run train the nn.I want to see your experiment step and result. Is not the same as my steps?
Have you looked at the README? There the steps are explained. I don't really understand why you copied the run.sh to the train_gmm.sh. As far as I can tell the alignment process is the same for the aishell task.
The rest of the steps seem correct.
I read readme carefully. Run.sh and train_gmm.sh are not same,but train_gmm.sh has most code of run.sh before train_tdnn,contains train_mono train_tri to train_5a and alignment to pdf.I do this is want to use your script than run.sh,It looks like it got rid of kaldi.and I do not want to switch between projects.So Could you show me your converge result?
They are indeed not the same because the run.sh contains the kaldi data prep as well. I have no results on this task, so I'm afraid I cannot help you with it.
Do you have the result of config/recipes/dnn/wsj?
Its been a while, let me check :)
@vrenkens Do you have any good results?I use 3 layer with 512 hidden units,and dropout to 1.there is also not converge
I do not have any results anymore, I'm running the experiments now to see what happens. I will let you know when I get the results :)
Hey fanlu, There was a bug in the decoding script, so you should probably pull the new code. I trained a DNN on WSJ. I just used the recipe as is, which is probably not very good. In particular I was using layer normalization and dropout, which is probably not a good idea.
Here you can see the plots for training
I got a WER of 11% in the end which is far from state of the art. I am going to run some more experiments with different recipes.
awesome.I will try aishell later.thanks.Could you share your experiment results When you finish?
I will :)
hi @vrenkens ,Can you share some tf debug trick to me ?Or how do you find the reason of not converage?
I actually did not have problems with convergence. What I normally do for debugging is first look at the computational graph using tensorboard. Just to check if everything looks the way I expect it to (correct shapes, connections, ...).
Then I will typically look at the histograms evolving over time.
As a last resort I use tf.Print statements to really look at the individual values to see if everything makes sence.
You could also look at you input features. Do they look normal, are they normalized correctly (zero mean and unit variance)
kaldi use mfcc for feature input, and in this project I find fbank in config/recipes/DNN/WSJ/feature_processor.cfg,Do these two different feature lead to the DNN training not converging?