modelscope / 3D-Speaker

A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Apache License 2.0
1.18k stars 101 forks source link

Missing transcripts? #54

Closed chenht2021 closed 9 months ago

chenht2021 commented 9 months ago

I read the FAQ on page. But I still find missing some transcripts, for example, the speaker 3D_SPK_00001 does not exist in transcription/train_transcription or transcription/test_transcription. I missed something? Or it just provides some transcripts.

GeekOrangeLuYao commented 9 months ago

Currently, our text annotations are only available for audio clips recorded with DIRECTIONAL devices. The reason for this is that we focus on annotating clear and distinct audio rather than using audio data that is not as clear, such as those from far-field recordings or in dialects. Our dataset is more focused on speaker-related tasks. If further text annotation releases, we will update the information on our website.

chenht2021 commented 9 months ago

Thanks for your explanation. Ok, maybe off topic, if not appropriate, pls close it. I read LAURAGPT, It says the the trainning data of TTS is LibriTTS and 3D-Speaker, and copied it 2 times, so the number of samples is 5.0M. LibriTTS train set is about 206K, and all 3D-Speaker's train set is about 643k, if count annotations, it will be less. So the number of samples for trainning TTS is wrong? should be 500k?

GeekOrangeLuYao commented 9 months ago

In the experiment with LauraGPT, data from the highest quality device of 3D-Speaker Datasets was utilized, and certain data augmentation was performed. For specific data details, please refer to the original paper.

GeekOrangeLuYao commented 9 months ago

After double-checking with the authors, it appears that the LibriTTS data you provided seems to be smaller than expected. Additionally, we have also utilized data from aishell-1,2,3 in the TTS tasks, which was inadvertently omitted in the current preprint version of our paper. We will rectify this detail in our subsequent revisions.