pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch
https://pytorch.org/audio
BSD 2-Clause "Simplified" License
2.49k stars 644 forks source link

I have some questions about RNNT loss. #3750

Open girlsending0 opened 7 months ago

girlsending0 commented 7 months ago

hello I would like to ask you a question that may be somewhat trivial. The shape of logits of RNN T loss is Batch, max_seq_len, max_target_len+1, class. Why is max_target_len+1 here? Shouldn't the number of classes be +1 to the size of the total vocab? Because blank is included. I don't understand at all. Is there anyone who can help?

https://pytorch.org/audio/main/generated/torchaudio.functional.rnnt_loss.html

csukuangfj commented 7 months ago

max_target_len+1 is not the vocab size. They are two different things.

You can find my implementation at https://github.com/csukuangfj/optimized_transducer/blob/master/optimized_transducer/csrc/cpu.cc#L83

girlsending0 commented 7 months ago

@csukuangfj Thank you.

I said that in a misleading way.

What I'm curious about is why target_length +1 needs to be entered as the RNNT loss's 3rd input. Looking at your code, I noticed that you wrote target length+1 because it includes a blank label.

Isn't the blank input already included in n_class? (When setting n_class, I think len(vocab)+1 should be set. Similar to CTC loss.)

I don't quite understand

csukuangfj commented 7 months ago

You need to differentiate between target length and number of classes.

The transcript of an utterance is converted to tokens. The target length is the number of tokens of the transcript. It is not number of classes. The possible value of a token is in the range [1, num_of_classes-1].

girlsending0 commented 7 months ago

So the number of classes should be len(vocab)? I understand. I had misunderstood the mechanism of RNN-Transducer. Since model will start from a blank label, it should be target_length+1.

csukuangfj commented 7 months ago

Great to hear it resolves your issue.

girlsending0 commented 7 months ago

@csukuangfj Thank you for your kindness.