Closed StephennFernandes closed 2 years ago
@Ajyy , I see on your Huggingface account you have uploaded speechT5 to Huggingface is it functional yet ?
@Ajyy by any chance would you be releasing speechT5 to Huggingface ?
@Ajyy
Hi, Can you show me any resources and links where i can extract HuBERT label on my custom dataset and prepare the speaker embeddings, FYI i have a multi-lingual dataset
Hey @Ajyy Thanks a ton for replying back ! means a lot
@Ajyy
I have noitced that for hubert extraction there is not multilingual HuBERT model available. so should i use the existing english HuBERT model ? would this affect my multilingual speechT5 models performance during pretraining and finetuning ?
Hi, I think you can try to use mHuBERT for multilingual SpeechT5. The english HuBERT model may affect the performance.
@Ajyy thanks a ton, using mHuBERT now for extracting HuBERT feature, btw which layer is it ideal to extract those feature from is it 6 or 11 ? or something else
@Ajyy post obtaining: wav2vec2 manifest
, mfcc feature
and hubert feature
i am only left with obtraining xvectors on the pretraining data to move forward with speechT5 training, But how could i obtain xvectors on my training data? I tried #16 but that's not working
@Ajyy thanks a ton, using mHuBERT now for extracting HuBERT feature, btw which layer is it ideal to extract those feature from is it 6 or 11 ? or something else
It should be 6 for HuBERT, and 11 for mHuBERT. Please refer to the original papers.
@Ajyy post obtaining:
wav2vec2 manifest
,mfcc feature
andhubert feature
i am only left with obtraining xvectors on the pretraining data to move forward with speechT5 training, But how could i obtain xvectors on my training data? I tried #16 but that's not working
Please try to read and understand the scripts as provided by @mechanicalsea . You need to change it a little bit for your own dataset.
@Ajyy okay got it!
thanks a ton
@Ajyy when extracting hubert labels whats the ideal n_cluster
value to be set ?
@Ajyy also slightly confused about the fairseq-preprocess script, could you show me how do i work around that, i have a large train.txt
and valid.txt
file for which i have used spm.model
to encode them into train_encoded.txt
and valid_encoded.txt
But i am confused about what it means about a dictionary creation and .bin files and how to feed then into fairseq-preprocess
@Ajyy also slightly confused about the fairseq-preprocess script, could you show me how do i work around that, i have a large
train.txt
andvalid.txt
file for which i have usedspm.model
to encode them intotrain_encoded.txt
andvalid_encoded.txt
But i am confused about what it means about a dictionary creation and .bin files and how to feed then into
fairseq-preprocess
You can check the preprocess for language model of fairseq.
After preprocess by fairseq-preprocess, you will get a bin and index file for your dataset, which can be read faster compared to txt file.
Hey there, I am looking forward. to pre-training SpeechT5 on a custom dataset. preferably multi-lingual datasets. could i please get some references, documentations etc as a starting point to get started on the same please. Thanks.