How would someone go about configuring AV-HuBERT to work with torchaudio.pipelines.Wav2Vec2FABundle? It currently only supports MMS_FA
Motivation, pitch
Currently the torchaudio.pipelines.Wav2Vec2FABundle forced aligner only supports MMS_FA.
This is a request to add support for an AV-ASR, namely AV-HuBERT. The feature could also be a tutorial on how to extend the list of supported models that are multimodal speech+video.
🚀 The feature
How would someone go about configuring AV-HuBERT to work with
torchaudio.pipelines.Wav2Vec2FABundle
? It currently only supports MMS_FAMotivation, pitch
Currently the
torchaudio.pipelines.Wav2Vec2FABundle
forced aligner only supports MMS_FA. This is a request to add support for an AV-ASR, namely AV-HuBERT. The feature could also be a tutorial on how to extend the list of supported models that are multimodal speech+video.Alternatives
No response
Additional context
No response