Closed LEECHOONGHO closed 1 year ago
There isn't currently an example for XVector training in Transformers! Would you like to contribute this? You can begin simply by opening a PR with the python script that you're using. We can then iterate on it to verify correctness and hopefully get a successfully trained XVector system!
Probably also worth asking the same question on the forum to boost visibility: https://discuss.huggingface.co
Also cc @anton-l who has a speaker verification (SV) checkpoint on the Hub (https://huggingface.co/anton-l/wav2vec2-base-superb-sv), wondering if you had a local script for XVector fine-tuning?
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hi, sorry for missing this! To answer @sanchit-gandhi's question: my SV checkpoint is a ported version of W2V2+XVector from S3PRL: https://github.com/s3prl/s3prl/tree/master/s3prl/downstream/sv_voxceleb1 So no finetuning scripts yet, just inference
Hey @LEECHOONGHO! If you want to work together to get a working XVector training script, feel free to open a PR with the script that you've got and tag me. We can iterate on it, ensuring correctness and building up to a full Transformers examples script! I think this would be of benefit to others in the community 🤗
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hello, I'm trying to train Wav2vec2ForXVector model on setting like below. But the training loss is not falling from 2.3~2.7. Is there any example for Wav2vec2ForXVector training? Or had anyone Experienced like this?
pretrained_model : korean wav2vec2 num of audio : 2300k num of speaker : 11223 num of used encoder layer : 1 output_xvector_dim : 512 learning rate : 2e-5 batch size : 512