Closed sean186 closed 4 years ago
Hi, I have the same problem!! I used the train model to predict train data, and the predict is totally incorrect !! like this:
@thu-spmi @aky15 Is there a problem with the model? Thank you!
Sorry for the late reply. As we stated in the paper, the training process will be a bit unstable in the initial stage, and it usually works to increase the weight of CTC loss (lamb in train.py) to help convergence. According to my experience, a CTC weight of 0.3 should ensure convergence for librispeech, even the batchsize is small (e.g. 64 or 32), but the result may be a little worse. You can adjust the CTC weight manually after the first few train steps for better results, and I am preparing to design an automatic way to gradually reduce the weight of CTC as the training progresses.
Thank you! My loss has dropped, using VGGBLSTM, Librispeech 1000h, but the accuracy is less than 4.09% of the test-clean data set.
decode_dev_clean/lattice %WER 6.71 [ 3650 / 54402, 273 ins, 843 del, 2534 sub ]
decode_dev_other/lattice %WER 15.17 [ 7730 / 50948, 615 ins, 1525 del, 5590 sub ]
decode_test_clean/lattice %WER 6.69 [ 3515 / 52576, 270 ins, 832 del, 2413 sub ]
decode_test_other/lattice %WER 15.79 [ 8264 / 52343, 592 ins, 1784 del, 5888 sub ]
train loss is about 20, cv loss:
mean_cv_loss: 17.023623735476765 mean_cv_partial_loss-84.86912908309546
mean_cv_loss: 16.849719316531452 mean_cv_partial_loss-84.9270972227439
mean_cv_loss: 16.71052746895032 mean_cv_partial_loss-84.97349450527093
mean_cv_loss: 16.590796837439903 mean_cv_partial_loss-85.0134047157744
mean_cv_loss: 16.582249959309895 mean_cv_partial_loss-85.01625367515108
mean_cv_loss: 16.675133142715847 mean_cv_partial_loss-84.98529261401576
params: lr:0.00001, lamb:0.01 Is there any way to achieve the accuracy of the paper? Look forward to your reply.
Hi! @FzuGsr ,we update the script of librispeech, removing the pruning of trigram contexts for denominator graph (line 85) and the training has become more stable. You can try the following configuration to get desirable result: model : BLSTM hdim : 512 lr : 0.001 lamb : 0.01 batch_size: 128 and notice that we use 4gram language model rescore (as detailed in the script) to get the result in the paper.
Hi! @FzuGsr ,we update the script of librispeech, removing the pruning of trigram contexts for denominator graph (line 85) and the training has become more stable. You can try the following configuration to get desirable result: model : BLSTM hdim : 512 lr : 0.001 lamb : 0.01 batch_size: 128 and notice that we use 4gram language model rescore (as detailed in the script) to get the result in the paper.
Thank you for your reply! I will try later. Does this affect the calculation of loss and lead to unstable training?
Yes, the denominator graph will affect the loss calculation.
Yes, the denominator graph will affect the loss calculation.
hi @aky15 I use the other model (eg, transformer), but the loss is strange
time: 8393485.422398832, partial_loss -318.4996032714844,tr_real_loss: -87.59403991699219, lr: 0.01
training epoch: 1, step: 1
time: 0.8945837002247572, partial_loss -273.60565185546875,tr_real_loss: -33.11537170410156, lr: 0.01
training epoch: 1, step: 2
time: 0.8515760116279125, partial_loss -310.4080810546875,tr_real_loss: -65.38531494140625, lr: 0.01
training epoch: 1, step: 3
time: 0.845015412196517, partial_loss -242.25860595703125,tr_real_loss: -3.054351806640625, lr: 0.01
training epoch: 1, step: 4
time: 0.8450672645121813, partial_loss -233.09873962402344,tr_real_loss: 5.82489013671875, lr: 0.01
Loss is negative, getting smaller and smaller. costs_alpha_den << (1+lamb)*costs_ctc ?
Look forward to your reply
CTC_CRF_LOSS input:
>>>labels.size()
torch.Size([201])
>>>input_lengths.size()
torch.Size([2])
>>>input_lengths
tensor([123, 95], dtype=torch.int32)
>>>label_lengths
tensor([109, 92], dtype=torch.int32)
>>>netout.size()
torch.Size([2, 131, 347])
is that right?
I found gpu_ctc has blank_label and gpu_den didn't ?
ctc_crf_base.gpu_ctc param blank_index is default 0, but the token file blank_index is 1 ?
<eps> 0
<blk> 1
<NSN> 2
<SPN> 3
AA0 4
AA1 5
a bug? Look forward to your reply
In our default model (e.g. BLSTM, LSTM), the length of neural network output is equal to the length of input frames. If the lengths have been changed in your model (e.g. you use down/up sampling or other techniques that leads to the change of the feature lengths. In your examples, the length of neural network output is 131, while the original input has 123 frames at most), you should use the corresponding length of neural network output as the input of CTC_CRF loss.
About the blank index, please refer to #11 .
About the blank index, please refer to #11 .
Thank you for your reply. Blank index = 1 causes lexicon_number.txt phone index and token.txt are inconsistent. Can i change the blank index to others in gpu_ctc function?
In our default model (e.g. BLSTM, LSTM), the length of neural network output is equal to the length of input frames. If the lengths have been changed in your model (e.g. you use down/up sampling or other techniques that leads to the change of the feature lengths. In your examples, the length of neural network output is 131, while the original input has 123 frames at most), you should use the corresponding length of neural network output as the input of CTC_CRF loss.
Yes,I use the down sampling and let the input_lengths down sampling too, but get the loss -inf
training epoch: 1, step: 1
netout torch.Size([8, 133, 365]) , labels torch.Size([833]), , input_lengths tensor([133, 131, 128, 116, 113, 113, 83, 77]), label_lengths tensor([121, 124, 105, 100, 149, 103, 70, 61], dtype=torch.int32)
netout torch.Size([8, 133, 365]) , labels torch.Size([869]), , input_lengths tensor([132, 128, 125, 121, 115, 103, 102, 66]), label_lengths tensor([111, 154, 105, 87, 100, 122, 124, 66], dtype=torch.int32)
time: 58.328730165958405, partial_loss -inf,tr_real_loss: -inf, lr: 0.001
is right for CTC_CRF loss input?
ctc_crf_base.gpu_den get -inf ?
>>>logits.size()
torch.Size([3, 163, 365])
>>>input_lengths
tensor([163, 160, 149])
>>>label_lengths
tensor([7, 6, 9], dtype=torch.int32)
>>>labels.size()
torch.Size([22])
>>>costs_ctc
tensor([-151.3362, 0.0000, -193.9635])
>>>costs_alpha_den
tensor([-3642.3738, -inf, -3498.7480], device='cuda:1')
>>>costs_beta_den
tensor([-3642.3711, -inf, -3498.7468], device='cuda:1')
Look forward to your reply
In our default model (e.g. BLSTM, LSTM), the length of neural network output is equal to the length of input frames. If the lengths have been changed in your model (e.g. you use down/up sampling or other techniques that leads to the change of the feature lengths. In your examples, the length of neural network output is 131, while the original input has 123 frames at most), you should use the corresponding length of neural network output as the input of CTC_CRF loss.
Yes,I use the down sampling and let the input_lengths down sampling too, but get the loss -inf
training epoch: 1, step: 1 netout torch.Size([8, 133, 365]) , labels torch.Size([833]), , input_lengths tensor([133, 131, 128, 116, 113, 113, 83, 77]), label_lengths tensor([121, 124, 105, 100, 149, 103, 70, 61], dtype=torch.int32) netout torch.Size([8, 133, 365]) , labels torch.Size([869]), , input_lengths tensor([132, 128, 125, 121, 115, 103, 102, 66]), label_lengths tensor([111, 154, 105, 87, 100, 122, 124, 66], dtype=torch.int32) time: 58.328730165958405, partial_loss -inf,tr_real_loss: -inf, lr: 0.001
is right for CTC_CRF loss input?
ctc_crf_base.gpu_den get -inf ?
>>>logits.size() torch.Size([3, 163, 365]) >>>input_lengths tensor([163, 160, 149]) >>>label_lengths tensor([7, 6, 9], dtype=torch.int32) >>>labels.size() torch.Size([22]) >>>costs_ctc tensor([-151.3362, 0.0000, -193.9635]) >>>costs_alpha_den tensor([-3642.3738, -inf, -3498.7480], device='cuda:1') >>>costs_beta_den tensor([-3642.3711, -inf, -3498.7468], device='cuda:1')
Look forward to your reply
The input_length should be at least 2*label_length - 1 (roughly). To be exactly, the input_length shold be at least ctc_length(labels)
, where ctc_length
is the function as follows:
def ctc_length(labels):
needed_blank_count = 0
for i in range(1, len(labels)):
if labels[i] == labels[i-1]:
needed_blank_count += 1
return len(labels) + needed_blank_count
For example, if the labels are "A A B B", the input feature should have 6 frames at least.
If the input_length is shorter than ctc_length(labels)
, the input can not walk through all the labels and the result is wrong. The gpu_ctc loss will be 0 and the gpu_den loss will be -inf.
You can change the blank_index of gpu_ctc, but you can't change the blank_index of gpu_den. If you only use gpu_ctc, you can choose your own blank_index and let the neural network output of this index be the probability of the blank label. If you want use gpu_ctc and gpu_den together, the blank index can't be changed.
In our default model (e.g. BLSTM, LSTM), the length of neural network output is equal to the length of input frames. If the lengths have been changed in your model (e.g. you use down/up sampling or other techniques that leads to the change of the feature lengths. In your examples, the length of neural network output is 131, while the original input has 123 frames at most), you should use the corresponding length of neural network output as the input of CTC_CRF loss.
Yes,I use the down sampling and let the input_lengths down sampling too, but get the loss -inf
training epoch: 1, step: 1 netout torch.Size([8, 133, 365]) , labels torch.Size([833]), , input_lengths tensor([133, 131, 128, 116, 113, 113, 83, 77]), label_lengths tensor([121, 124, 105, 100, 149, 103, 70, 61], dtype=torch.int32) netout torch.Size([8, 133, 365]) , labels torch.Size([869]), , input_lengths tensor([132, 128, 125, 121, 115, 103, 102, 66]), label_lengths tensor([111, 154, 105, 87, 100, 122, 124, 66], dtype=torch.int32) time: 58.328730165958405, partial_loss -inf,tr_real_loss: -inf, lr: 0.001
is right for CTC_CRF loss input? ctc_crf_base.gpu_den get -inf ?
>>>logits.size() torch.Size([3, 163, 365]) >>>input_lengths tensor([163, 160, 149]) >>>label_lengths tensor([7, 6, 9], dtype=torch.int32) >>>labels.size() torch.Size([22]) >>>costs_ctc tensor([-151.3362, 0.0000, -193.9635]) >>>costs_alpha_den tensor([-3642.3738, -inf, -3498.7480], device='cuda:1') >>>costs_beta_den tensor([-3642.3711, -inf, -3498.7468], device='cuda:1')
Look forward to your reply
- The input_length should be at least 2*label_length - 1 (roughly). To be exactly, the input_length shold be at least
ctc_length(labels)
, wherectc_length
is the function as follows:def ctc_length(labels): needed_blank_count = 0 for i in range(1, len(labels)): if labels[i] == labels[i-1]: needed_blank_count += 1 return len(labels) + needed_blank_count
For example, if the labels are "A A B B", the input feature should have 6 frames at least. If the input_length is shorter than
ctc_length(labels)
, the input can not walk through all the labels and the result is wrong. The gpu_ctc loss will be 0 and the gpu_den loss will be -inf.
- You can change the blank_index of gpu_ctc, but you can't change the blank_index of gpu_den. If you only use gpu_ctc, you can choose your own blank_index and let the neural network output of this index be the probability of the blank label. If you want use gpu_ctc and gpu_den together, the blank index can't be changed.
Thank you very much for your reply! Is the calculation result of function gpu_ctc the same as that of function torch.nn.function.ctc_loss? Does gpu_ctc use cuda to optimize the calculation speed of CTC ?
Look forward to your reply
Thank you very much for your reply! Is the calculation result of function gpu_ctc the same as that of function torch.nn.function.ctc_loss? Does gpu_ctc use cuda to optimize the calculation speed of CTC ?
Look forward to your reply
The gpu_ctc is modified from Baidu warp-ctc https://github.com/baidu-research/warp-ctc. We change the warp-ctc's input from logits (without log_softmax) to log_softmax. torch.nn.function.ctc_loss is pytorch's implementation of CTC. It may use CUDNN's CTC implementation internally (and the input is log_softmax). Anyway, warp-ctc and pytorch's CTC are different implementations of CTC, so they should have similar results. They both support calculation using cuda. I have not compared the speed of them and I think the CTC computation only accounts for small propotation of the whole computation (including neural network computation).
Thank you very much for your reply! Is the calculation result of function gpu_ctc the same as that of function torch.nn.function.ctc_loss? Does gpu_ctc use cuda to optimize the calculation speed of CTC ? Look forward to your reply
The gpu_ctc is modified from Baidu warp-ctc https://github.com/baidu-research/warp-ctc. We change the warp-ctc's input from logits (without log_softmax) to log_softmax. torch.nn.function.ctc_loss is pytorch's implementation of CTC. It may use CUDNN's CTC implementation internally (and the input is log_softmax). Anyway, warp-ctc and pytorch's CTC are different implementations of CTC, so they should have similar results. They both support calculation using cuda. I have not compared the speed of them and I think the CTC computation only accounts for small propotation of the whole computation (including neural network computation).
Thank you very much for your reply! You are so excellent.
HI, I run the libr/run.sh demo ,but the loss is still so large, the model can't converge.Can you help me? Is it possible to release model configs or trained models? my env: pytorch 1.5 cuda 10.1 python3.7 run
python3 steps/train.py --lr=0.001 --output_unit=72 --lamb=0.001 --data_path=$dir --batch_size=256
loss:Look forward to your reply. Thank you