Closed teowenshen closed 9 months ago
AlignmentItems are starting relative to the start of the recording, and when you create a cut set, they will automatically get times relative to the start of the cut. You can check out AMI or LibriSpeech recipes both of which support word alignments this way.
I have a long recording split into shorter supervision segments, and I have obtained alignments.
When attaching alignments to supervision segments as
AlignmentItem
, is it recommended to use start time with respect to the start of the supervision segment, or the start of the entire recording?Also, I have been studying on
TemporalArray
, but sinceTemporalArray
is per-frame and my alignments are per-word, I am not sure how to useTemporalArray
for alignments.If there is a recipe that uses the Lhotse recommended way for alignments, from data preparation to dataset objects, please let me know too and I will start from there.