k2-fsa / icefall

https://k2-fsa.github.io/icefall/
Apache License 2.0
912 stars 292 forks source link

Why some audio greedy decoding outputs are all <oov>? #1209

Closed ziyu123 closed 1 year ago

ziyu123 commented 1 year ago

I am training an english zipformer2 model, but some audio decode are all , like this 000086 ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ' ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ' ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ When I concat the above audio together, it will output normally and When I trim the previous few seconds of audio, some will output normally and some will not.
What is the cause of this?

ziyu123 commented 1 year ago

I force to set logits[:,2] += -999999.0 when decoding.

csukuangfj commented 1 year ago

I force to set logits[:,2] += -999999.0 when decoding.

Does it fix your issue?

csukuangfj commented 1 year ago

If yes, you can modify the code to treat unk_id as blank_id.

ziyu123 commented 1 year ago

If yes, you can modify the code to treat unk_id as blank_id.

thanks a lot, set logits[:,2] solved my problem, if treat unk_id as blank_id, I think it's also ok.

csukuangfj commented 1 year ago

We are treating unk_id as blank_id: https://github.com/k2-fsa/icefall/blob/00256a766921dd34a267012b0e2b8ff7d538f0e6/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py#L631

https://github.com/k2-fsa/icefall/blob/00256a766921dd34a267012b0e2b8ff7d538f0e6/egs/librispeech/ASR/pruned_transducer_stateless2/beam_search.py#L746

Which decoding method and which file are you using?

danpovey commented 1 year ago

Maybe some training utts were all unk due to some encoding issue or language mismatch?

On Thursday, August 10, 2023, Fangjun Kuang @.***> wrote:

We are treating unk_id as blank_id: https://github.com/k2-fsa/icefall/blob/00256a766921dd34a267012b0e2b8f f7d538f0e6/egs/librispeech/ASR/prunedtransducer stateless2/beam_search.py#L631

https://github.com/k2-fsa/icefall/blob/00256a766921dd34a267012b0e2b8f f7d538f0e6/egs/librispeech/ASR/prunedtransducer stateless2/beam_search.py#L746

Which decoding method and which file are you using?

— Reply to this email directly, view it on GitHub https://github.com/k2-fsa/icefall/issues/1209#issuecomment-1673046311, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZFLOYPUADII5IVGUMSEXTXUTBBHANCNFSM6AAAAAA3K33LOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

ziyu123 commented 1 year ago

Maybe some training utts were all unk due to some encoding issue or language mismatch?

i‘m very sorry, IT'S MY TRAIN DATA ERROR, some utts in my train text are lowercase, The lowercase English letters cause the sentence to be all