Closed haha010508 closed 1 year ago
It seem USER_DIR
was not given.
We would set USER_DIR
to speecht5
code directory so that the fairseq adds 'speecht5' task into its task list.
This is my code: `CHECKPOINT_PATH=/project/SpeechT5/SpeechT5/pretrained_models/speecht5_sid.pt DATA_ROOT=/project/SpeechT5/SpeechT5/manifest SUBSET=test USER_DIR=/project/SpeechT5/SpeechT5/speecht5 RESULTS_PATH=/project/SpeechT5/SpeechT5/experimental/s2c/results
mkdir -p ${RESULTS_PATH}
python scripts/generate_class.py ${DATA_ROOT} \ --gen-subset ${SUBSET} \ --user-dir ${USER_DIR} \ --log-format json \ --task speecht5 \ --t5-task s2c \ --path ${CHECKPOINT_PATH} \ --results-path ${RESULTS_PATH} \ --batch-size 1 \ --max-speech-positions 8000 \ --sample-rate 16000 | tee -a ${RESULTS_PATH}/generate-class.txt`
and , if i debug the code, i got this error:
python -m ipdb scripts/generate_class.py ...
ImportError: cannot import name 'metrics' from 'fairseq' (unknown location)
and if i run the code, i got this error:
soundfile.LibsndfileError: <exception str() failed>
does this mean no wav file? if right, how to specify file path? i have dowanload the vox1 already.
i dont know why i can not debug the code? and why the debug and run the code have different error? Thanks very much!
The error of import seems as Multi-GPU training doesn't work when --user-dir
specified. Move or link the USER_DIR
in the directory of fairseq/examples
and use it as USER_DIR
. The issue occurs at https://github.com/facebookresearch/fairseq/issues/4875.
The error of import seems as Multi-GPU training doesn't work when
--user-dir
specified. Move or link theUSER_DIR
in the directory offairseq/examples
and use it asUSER_DIR
. The issue occurs at facebookresearch/fairseq#4875.
Thanks for your reply,i try it. but got same error
i find this is a bug
from fairseq import metrics, search, tokenizer, utils
got this error
ImportError: cannot import name 'metrics' from 'fairseq' (unknown location)
and the metrics
file in fairseq/logging
i find this is a bug
from fairseq import metrics, search, tokenizer, utils
got this errorImportError: cannot import name 'metrics' from 'fairseq' (unknown location)
and themetrics
file infairseq/logging
It seems an issue caused by the version of torch. The issue occurs when I reimplement the SpeechT5 in a new environment. Could you provide some details of your computer environment. By the way, I usually conduct the SpeechT5 using 1.10.x torch.
The issue caused by fairseq, you need move the metrics.py
and meters.py
from fairseq/logging
to fairseq
folder, and then the error disappeared. my torch version is 2.0.0, but this version not installed by me, it is installed by fairseq or espnet
By the way, if use VoxCeleb1-O to evaluate the speaker class performance? how about the EER score? usually, we enroll some speakers in dataset, and in test, we get an embedding and use cos similar to compute the similar with enrolled speakers embedding, and decide if they are same speaker or not, the speaker verify is not use speaker classify method. so compare with ECAPA-TDNN, How about the speaker_sid model performance?
By the way, if use VoxCeleb1-O to evaluate the speaker class performance? how about the EER score? usually, we enroll some speakers in dataset, and in test, we get an embedding and use cos similar to compute the similar with enrolled speakers embedding, and decide if they are same speaker or not, the speaker verify is not use speaker classify method. so compare with ECAPA-TDNN, How about the speaker_sid model performance?
For SID, the fune-tuned SpeechT5 produce 96.46% accuracy. The paper of ECAPA-TDNN did not report VoxCeleb1 SID results, making it difficult to compare with SpeechT5. For ASV (report EER score), the fune-tuned SpeechT5 did not conduct this task. If we would like to compare SpeechT5 and ECAPA-TDNN, we first extract speaker embedding from SpeechT5. Generally speaking, we can consider the hidden state before the input of decoder's classifier as speaker embedding, making it available to compare with ECAPA-TDNN. Or we could create a speaker model as Transformer variant (a) to obtain speaker embeddings.
so we can get the speaker embedding from this line: https://github.com/microsoft/SpeechT5/blob/7134e960999bc20d1d80650f7361f35d5fd8d38a/SpeechT5/speecht5/models/speecht5.py#L1183 right? a 768 dim data?
so we can get the speaker embedding from this line:
right? a 768 dim data?
yes
i want to run the sid pretrain model, but i got an error like this:
generate_class.py: error: argument --task: invalid choice: 'speecht5' (choose from 'masked_lm', 'cross_lingual_lm', 'translation', 'hubert_pretraining', 'online_backtranslation', 'denoising', 'multilingual_denoising', 'translation_multi_simple_epoch', 'legacy_masked_lm', 'translation_from_pretrained_bart', 'language_modeling', 'multilingual_translation', 'sentence_prediction', 'sentence_ranking', 'translation_lev', 'audio_pretraining', 'translation_from_pretrained_xlm', 'multilingual_masked_lm', 'speech_to_text', 'simul_speech_to_text', 'simul_text_to_text', 'semisupervised_translation', 'dummy_lm', 'dummy_masked_lm', 'dummy_mt')
if i must fine tune sid, i run fine tune sid, and got this errorfairseq-train: error: argument --task: invalid choice: 'speecht5' (choose from 'masked_lm', 'cross_lingual_lm', 'translation', 'hubert_pretraining', 'online_backtranslation', 'denoising', 'multilingual_denoising', 'translation_multi_simple_epoch', 'legacy_masked_lm', 'translation_from_pretrained_bart', 'language_modeling', 'multilingual_translation', 'sentence_prediction', 'sentence_ranking', 'translation_lev', 'audio_pretraining', 'translation_from_pretrained_xlm', 'multilingual_masked_lm', 'speech_to_text', 'simul_speech_to_text', 'simul_text_to_text', 'semisupervised_translation', 'dummy_lm', 'dummy_masked_lm', 'dummy_mt') SID finetuning finished
so, how to run the model correctly? thanks!