Open tz301 opened 2 years ago
I use --max-duration=20
What if you change it to --max-duration=1
so that there is only one utterance in a batch when using decode.py
?
There is a convolutional layer in the conformer model and the padding in a batch may affect the result.
I see the feature extraction is not the same, will this be the reason?
Both feature extractors use the same parameters and should produce the same features.
I use --max-duration=20
What if you change it to
--max-duration=1
so that there is only one utterance in a batch when usingdecode.py
?There is a convolutional layer in the conformer model and the padding in a batch may affect the result.
I have use --max-duration=1 for decode, but meet MemoryError below. Also print the batch data for error.
Can you try doing export K2_SYNC_KERNELS=1 and rerunning? Error might be earlier.
Can you try doing export K2_SYNC_KERNELS=1 and rerunning? Error might be earlier.
I export K2_SYNC_KERNELS=1 and run again, see below.
Hm, I think we're not quite drilling down into the error yet. Looks like the error may have occurred in _k2.index, which goes to C++ code. See if you can find it by running with gdb; you may need to do 'catch throw'. e.g.: gdb --args python3 something.py --opt1 foo ... etc. (gdb) catch throw (gdb) r ...you may have to "continue": (gdb) c if there are previous exceptions that are ignored by the program. once you get to where the exception is raised, see if you can print out any relevant-looking local variables.
I have trained my own model and test one my datasets. The first step is decoded with many params, just like finetune (as decode.py), and save the best params. I use --max-duration=20. And I will save the decode results using the best params on all dataset (not just one).
Then I use this best params to decode wave (as pretrained.py), just one by one on these datasets.
All my dataset show a little worse cer using pretrained.py. Cer comparision below. decode.py: 3.190 12.802 17.995 9.569 14.478 10.299 16.242 7.329 20.695 pretrained.py: 3.203 13.029 18.177 9.662 14.610 10.447 16.463 7.333 20.911
Is this normal? I see the feature extraction is not the same, will this be the reason?