hitachi-speech / EEND

End-to-End Neural Diarization
MIT License
377 stars 59 forks source link

Trained custom data on mini_librispeech recipe but inference just gives 1 speaker for whole audio file. #33

Open saumyaborwankar opened 3 years ago

saumyaborwankar commented 3 years ago
SPEAKER aaak 1   11.40    0.10 <NA> <NA> aaak_4 <NA>
SPEAKER aaak 1   14.00    0.10 <NA> <NA> aaak_4 <NA>

This is the hyp_0.3_1.rttm I got after scoring. For the entire aaak.wav file only aaak_4 speaker is detected.

"main/DER": 0.4484034770634306,
"validation/main/DER": 0.5290581162324649,

This is the DER after 200 epochs. Can someone help me understand why the inference is detecting just one speaker.

aaaa wav_8/aaaa.wav
aaab wav_8/aaab.wav

This is wav.scp (first 2 lines)

aaab-000521-000625 Khanna
aaab-000829-000923 Khanna

This is the utt2spk file

aaab-000521-000625 aaab 5.21 6.25
aaab-000829-000923 aaab 8.29 9.23

This is the segments file

kli017 commented 2 years ago

Hello, I met the same problem while training on mini_librispeech recipe. I made a 2 speaker no overlap dataset and with the epoch increase the model just detect 1 speaker. Do you find the reason?