hitachi-speech / EEND

End-to-End Neural Diarization
MIT License
360 stars 57 forks source link

I want to know how to prepare my own data? #2

Closed liuyue94 closed 2 years ago

liuyue94 commented 4 years ago

I know for example spk2utt utt2spk wav.scp and so on ,I want to know i need to create rttm and segments file?is segments file created by SAD?

yubouf commented 4 years ago

You need segments, reco2dur, wav.scp, utt2spk and spk2utt. The segments can be generated by SAD when you have two-speaker conversations in separate channels.

Our training data generated by running run_prepare_shared.sh are “simulated” mixtures, where segments are generated according to its random simulation.

rttm is required for scoring you should prepare it for test data.

liuyue94 commented 4 years ago

sorry ,I still have some places that don’t understand. I only have some two-speaker conversations video, for example , it is named speech.wav which contains two speakers. I want to know that the segments is created directly from speech.wav depending on sad? or i need to create segments which only contain one speakers by manual?

yubouf commented 4 years ago

If you want to prepare your own "training" data, you need to create segments as reference labels. It should be created manually when you only have two-speaker mixtures in monaural recordings.

Or if you want to perform "inference" with a pre-trained model, all you need is "wav.scp." However, we have not yet released such a pre-trained model. We have no easy-to-use tools to perform inference only.