Closed ahban closed 7 months ago
Could you please try 1best decoding first to see if the model converging well. According to your logs, it did not reach attention-decoder at all. I suspect that you have a too large lattice. Try using smaller output_beam and max_active_states.
1best works well. But 98% GPU memory has been allocated. I am trying to shrink max_active_states.
1best works well. But 98% GPU memory has been allocated. I am trying to shrink max_active_states.
I meant the CER, is it as good as expected, 1best decoding should not consume so much memory. Anyway, try decreasing the search_beam, output_beam and max_active_states.
What are the final loss values, e.g. the final line in your log file that says "Epoch xx, batch xxx"
CER for 1best:
2022-04-06 03:22:15,814 INFO [utils.py:407] [test-no_rescore] %WER 6.74% [7060 / 104765, 112 ins, 1473 del, 5475 sub ]
final loss values:
2022-04-05 20:55:37,598 INFO [train.py:512] (1/2) Epoch 89, batch 12160, loss[ctc_loss=0.05017, att_loss=0.1513, loss=0.1209, over 1739.00 frames.], tot_loss[ctc_loss=0.07525, att_loss=0.2027, loss=0.164
4, over 334052.84 frames.], batch size: 8
2022-04-05 20:55:37,600 INFO [train.py:512] (0/2) Epoch 89, batch 12160, loss[ctc_loss=0.06977, att_loss=0.1519, loss=0.1273, over 1600.00 frames.], tot_loss[ctc_loss=0.07497, att_loss=0.2029, loss=0.164
5, over 334191.86 frames.], batch size: 7
The loss values seem to be normal, but the CER for 1best decoding is little worse, it is expected to be as follows:
%WER = 4.99
Errors: 53 insertions, 350 deletions, 4825 substitutions, over 104765 reference words (99590 correct)
Search below for sections starting with PER-UTT DETAILS:, SUBSTITUTIONS:, DELETIONS:, INSERTIONS:, PER-WORD STATS:
Did you try smaller beams and activate states?
smaller beam and activate states bring worse CER results.
@pkufool Did your 'WER = 4.99' use only Aishell training data?
My model only trained on Aishell dataset.
Yes, only Aishell dateset. Someone had reproduced the result in our recipe, see https://github.com/k2-fsa/icefall/issues/112#issuecomment-975146755.
@pkufool
I have reproduced the results with
lhotse=1.0.0
k2 version: 1.13
Build type: Release
Git SHA1: 47c4b754bb418b2a40c3ee0f24ca5ed12b08997f
Git date: Sat Jan 29 09:39:32 2022
Cuda used to build k2: 11.1
cuDNN used to build k2: 8.0.4
Python version used to build k2: 3.8
OS used to build k2: Ubuntu 18.04.6 LTS
CMake version: 3.18.4
GCC version: 7.5.0
CMAKE_CUDA_FLAGS: --expt-extended-lambda -gencode arch=compute_35,code=sm_35 --expt-extended-lambda -gencode arch=compute_50,code=sm_50 --expt-extended-lambda -gencode arch=compute_60,code=sm_60 --expt-extended-lambda -gencode arch=compute_61,code=sm_61 --expt-extended-lambda -gencode arch=compute_70,code=sm_70 --expt-extended-lambda -gencode arch=compute_75,code=sm_75 --expt-extended-lambda -gencode arch=compute_80,code=sm_80 --expt-extended-lambda -gencode arch=compute_86,code=sm_86 -D_GLIBCXX_USE_CXX11_ABI=0 --compiler-options -Wall --compiler-options -Wno-unknown-pragmas --compiler-options -Wno-strict-overflow
CMAKE_CXX_FLAGS: -D_GLIBCXX_USE_CXX11_ABI=0 -Wno-strict-overflow
PyTorch version used to build k2: 1.8.1
PyTorch is using Cuda: 11.1
NVTX enabled: True
With CUDA: True
Disable debug: True
Sync kernels : False
Disable checks: False
GPU : GTX 3090 24G
OS: ubuntu 18.04
conda install -c k2-fsa -c pytorch -c conda-forge k2=1.13.dev20220129 python=3.8 cudatoolkit=11.1 pytorch=1.8.1 torchaudio=0.8.1
python3 conformer_ctc/train.py --bucketing-sampler True \
--max-duration 200 \
--start-epoch 0 \
--num-epochs 90 \
--world-size 4 > train.log
python3 conformer_ctc/decode.py --nbest-scale 0.5 \
--epoch 84 \
--avg 25 \
--method attention-decoder \
--max-duration 20 \
--num-paths 100
# best CER = 4.26%
Thanks!!!
Egs: AIshell
After several tries, I still fail to run the confomer-ctc/decode.py with the following
It seems that there is not enough GPU memory. But I have set
--max-duration
to a very small value. I think the attention-decoder consumes much GPU memory for rescoring. Will it be solved if we move the rescore procedure to the CPU? How?