pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch
https://pytorch.org/audio
BSD 2-Clause "Simplified" License
2.43k stars 636 forks source link

AV-HuBERT integration with torchaudio.pipelines.Wav2Vec2FABundle #3717

Open bejjani opened 6 months ago

bejjani commented 6 months ago

🚀 The feature

How would someone go about configuring AV-HuBERT to work with torchaudio.pipelines.Wav2Vec2FABundle? It currently only supports MMS_FA

Motivation, pitch

Currently the torchaudio.pipelines.Wav2Vec2FABundle forced aligner only supports MMS_FA. This is a request to add support for an AV-ASR, namely AV-HuBERT. The feature could also be a tutorial on how to extend the list of supported models that are multimodal speech+video.

Alternatives

No response

Additional context

No response