Open lumpidu opened 4 months ago
Hi, thanks for your work. I'd be interested, if the model also provides phoneme-level timing information at inference ?
Yes, the attn matrix can be used to get the frame numbers and convert to time in seconds during inference.
Hi, thanks for your work. I'd be interested, if the model also provides phoneme-level timing information at inference ?