Closed drawfish closed 2 years ago
The reason is that you have overlapping segments. See explanation here https://lhotse.readthedocs.io/en/latest/api.html#lhotse.cut.CutSet.trim_to_supervisions You want to pass keep_overlapping=False to trim_to_supervisions.
BTW the negative time is an indication that a segment started before the start of the cut. It is useful if you’re explicitly trying to model overlapped speech and do something about it.
The problem has been fixed. Thanks~
After generate recording.jsonl.gz and supervison.jsonl.gz from kaldi data directory with command:
lhost kaldi import <kaldi-data-dir> <samplerate> <lhotse-data-dir>
I creat a dataset from these two manifests with:and then I construct datasampler and dataloader with:
however, when iterate through the train_dl, an assertionerror exception occurs:
Then I export out the Cutset information into jsonl.gz file and the filter out the Cut of the error supervision id:
From the information of supervisions id: "Y0000008176_KtzfOHuuzd8_S00691", we can see that the start time of it is negative which triggered the exception. The segments file of kaldi data directory:
The line in lhotse data directory of recordings.jsonl.gz :
The line in lhotse data directory of supervisions.jsonl.gz :
My question is how such negative start time was created and how to modify the configuration of function "trim_to_supervisions" to correct it?