mayank-git-hub / ETE-Speech-Recognition

Implementation of Hybrid CTC/Attention Architecture for End-to-End Speech Recognition in pure python and PyTorch
Apache License 2.0
26 stars 2 forks source link

CER result problem #6

Closed TheisTrue closed 4 years ago

TheisTrue commented 4 years ago

Excuse me for my abrupt questions. I used your model to test, CER is 0.78. It's too high. Is it the problem of the model used or specific_config.py problem? The result is as follows: Avg. Loss: 265.5176 | Avg Loss_Att: 241.9316 | Avg Loss_CTC: 320.5516 | CER: 0.7817

specific_config.py: import os

path_to_download='./librispeech' base_model_path='./BaseModel'

os.makedirs(path_to_download, exist_ok=True) os.makedirs(base_model_path, exist_ok=True)

test_model = './model/ASR.pth' cache_dir='./unigram'

resume = { 'restart': False, 'model_path': '' }

use_cuda = False

if not use_cuda: num_cuda = '0' os.environ["CUDA_VISIBLE_DEVICES"] = num_cuda else: num_cuda = '0'

mayank-git-hub commented 4 years ago

For how many epochs did you train? I believe you need to train for around 100 epochs to get satisfactory results. I have not trained the model yet for those many number of epochs due to limited computation.

The only difference between this code and ESPNET's is that the parameters for calculating STFT might differ. If you can get the exact configuration for STFT in ESPNET and change the values in config their pre-trained model should work as is in this code. Otherwise, you can always load their model and fine-tune.

I needed the ASR for improving my speech separation model which could be achieved by even high CER. Hence I have not trained the model for state of the art convergence.

TheisTrue commented 4 years ago

Thank you very much. I'll use the model of espnet to test.