p0p4k / pflowtts_pytorch

Unofficial implementation of NVIDIA P-Flow TTS paper
https://neurips.cc/virtual/2023/poster/69899
MIT License
198 stars 28 forks source link

Does model output phoneme-level timing info ? #32

Open lumpidu opened 4 months ago

lumpidu commented 4 months ago

Hi, thanks for your work. I'd be interested, if the model also provides phoneme-level timing information at inference ?

p0p4k commented 4 months ago

Yes, the attn matrix can be used to get the frame numbers and convert to time in seconds during inference.