lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
936 stars 214 forks source link

Question about Recommended Method for Using Alignments #1230

Closed teowenshen closed 9 months ago

teowenshen commented 10 months ago

I have a long recording split into shorter supervision segments, and I have obtained alignments.

When attaching alignments to supervision segments as AlignmentItem, is it recommended to use start time with respect to the start of the supervision segment, or the start of the entire recording?

Also, I have been studying on TemporalArray, but since TemporalArray is per-frame and my alignments are per-word, I am not sure how to use TemporalArray for alignments.

If there is a recipe that uses the Lhotse recommended way for alignments, from data preparation to dataset objects, please let me know too and I will start from there.

pzelasko commented 10 months ago

AlignmentItems are starting relative to the start of the recording, and when you create a cut set, they will automatically get times relative to the start of the cut. You can check out AMI or LibriSpeech recipes both of which support word alignments this way.