Closed tikikun closed 1 week ago
hi @bachvudinh per your concern, please download and further expand the training set with this data
https://huggingface.co/datasets/facebook/multilingual_librispeech
the dataset are uploaded on the HF here: https://huggingface.co/datasets/homebrewltd/raw-speech-whispervq-v2. and on internal s3: myminio/data/FB_multilingual_librispeech/ cc @tikikun
it's done
We will change tokenizer for the next version of the updated model