Closed liuyue94 closed 3 years ago
You need segments, reco2dur, wav.scp, utt2spk and spk2utt. The segments can be generated by SAD when you have two-speaker conversations in separate channels.
Our training data generated by running run_prepare_shared.sh are “simulated” mixtures, where segments are generated according to its random simulation.
rttm is required for scoring you should prepare it for test data.
sorry ,I still have some places that don’t understand. I only have some two-speaker conversations video, for example , it is named speech.wav which contains two speakers. I want to know that the segments is created directly from speech.wav depending on sad? or i need to create segments which only contain one speakers by manual?
If you want to prepare your own "training" data, you need to create segments as reference labels. It should be created manually when you only have two-speaker mixtures in monaural recordings.
Or if you want to perform "inference" with a pre-trained model, all you need is "wav.scp." However, we have not yet released such a pre-trained model. We have no easy-to-use tools to perform inference only.
I know for example spk2utt utt2spk wav.scp and so on ,I want to know i need to create rttm and segments file?is segments file created by SAD?