hitachi-speech / EEND

End-to-End Neural Diarization
MIT License
368 stars 57 forks source link

?How does scoring works in run_eda.sh? #40

Open anuragkumar95 opened 2 years ago

anuragkumar95 commented 2 years ago

I have trained a 2-spk model on custom dataset with an overall DER of 0.063. I would like to run inference and scoring on another custom dataset containing 2 spks.

The part that does scoring in run_eda.sh is

if [ $stage -le 8 ]; then
    echo "scoring at $scoring_dir"
    if [ -d $scoring_dir ]; then
        echo "$scoring_dir already exists. "
        echo " if you want to retry, please remove it."
        exit 1
    fi
    for dset in callhome2_spkall; do
        work=$scoring_dir/$dset/.work
        mkdir -p $work
        find $infer_dir/$dset -iname "*.h5" > $work/file_list_$dset
        for med in 1 11; do
        for th in 0.3 0.4 0.5 0.6 0.7; do
        make_rttm.py --median=$med --threshold=$th \
            --frame_shift=$infer_frame_shift --subsampling=$infer_subsampling --sampling_rate=$inf$
            $work/file_list_$dset $scoring_dir/$dset/hyp_${th}_$med.rttm
        md-eval.pl -c 0.25 \
            -r data/eval/$dset/rttm \
            -s $scoring_dir/$dset/hyp_${th}_$med.rttm > $scoring_dir/$dset/result_th${th}_med${med}_collar0.25 2>/dev/null || exit
        done
        done
    done
fi

Now I understand that make_rttm.py creates hypotheses which I was able to do however, I am unable to figure out how the rescoring works. In run_eda.sh rescoring is done using

md-eval.pl -c 0.25 -r data/eval/$dset/rttm -s $scoring_dir/$dset/hyp_${th}_$med.rttm > $scoring_dir/$dset/result_th${th}_med${med}_collar0.25 2>/dev/null || exit

I cannot find the file md-eval.pl which seems to be doing the rescoring here. Can anybody point me towards this particular file?

desh2608 commented 2 years ago

It is a NIST scoring tool. You can find it here: https://github.com/foundintranslation/Kaldi/blob/master/tools/sctk-2.4.0/src/md-eval/md-eval.pl