pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch
https://pytorch.org/audio
BSD 2-Clause "Simplified" License
2.48k stars 640 forks source link

Provide Attention scores from Transformer #2931

Open sdeva14 opened 1 year ago

sdeva14 commented 1 year ago

🚀 The feature

Thanks for your amazing contributions.

As far as I understand, the Transformer encoder employed in torchaudio does not provide attention scores in their outputs. Otherwise, please ignore this thread and let me know.

The following line can be saved as attention scores, then can be provided in line 326 return.

https://github.com/pytorch/audio/blob/1717edaa8cddf5068df97e30404d85654f0b55f4/torchaudio/models/wav2vec2/components.py#L317

Instead, the current implementation does not return but only the representations of vectors. Line 326: return output, None

Motivation, pitch

The attention scores of Transformer encoder are very valuable information to design more advanced models. Huggingface implementation allows it by configurations, and it allows other AI researchers to explore new studies, such as the model predictions considering attention scores, the loss function considering attention scores as well.

Alternatives

No response

Additional context

No response

mthrok commented 1 year ago

Hi @sdeva14

Thanks for the suggestion and I think this is a good addition. We need to think of how to actually enable this, preferably with keeping the backward compatibility.