In your paper you describe the wav2vec2 model you use, but I still don't know exactly which one is used. Is it "Wav2Vec 2.0 Large, 960 hours. Librispeech" in https://github.com/facebookresearch/fairseq/tree/main/examples/wav2vec ? Then what does "fine-tuned on psuedo-labels" mean? Thank you!
In your paper you describe the wav2vec2 model you use, but I still don't know exactly which one is used. Is it "Wav2Vec 2.0 Large, 960 hours. Librispeech" in https://github.com/facebookresearch/fairseq/tree/main/examples/wav2vec ? Then what does "fine-tuned on psuedo-labels" mean? Thank you!