Closed TheisTrue closed 4 years ago
For how many epochs did you train? I believe you need to train for around 100 epochs to get satisfactory results. I have not trained the model yet for those many number of epochs due to limited computation.
The only difference between this code and ESPNET's is that the parameters for calculating STFT might differ. If you can get the exact configuration for STFT in ESPNET and change the values in config their pre-trained model should work as is in this code. Otherwise, you can always load their model and fine-tune.
I needed the ASR for improving my speech separation model which could be achieved by even high CER. Hence I have not trained the model for state of the art convergence.
Thank you very much. I'll use the model of espnet to test.
Excuse me for my abrupt questions. I used your model to test,
CER
is 0.78. It's too high. Is it the problem of the model used orspecific_config.py
problem? Theresult
is as follows:Avg. Loss: 265.5176 | Avg Loss_Att: 241.9316 | Avg Loss_CTC: 320.5516 | CER: 0.7817
specific_config.py
: import ospath_to_download='./librispeech' base_model_path='./BaseModel'
os.makedirs(path_to_download, exist_ok=True) os.makedirs(base_model_path, exist_ok=True)
test_model = './model/ASR.pth' cache_dir='./unigram'
resume = { 'restart': False, 'model_path': '' }
use_cuda = False
if not use_cuda: num_cuda = '0' os.environ["CUDA_VISIBLE_DEVICES"] = num_cuda else: num_cuda = '0'