rhasspy / piper

A fast, local neural text to speech system
https://rhasspy.github.io/piper-samples/
MIT License
6.27k stars 461 forks source link

Alignment data should be exposed as one of the outputs #70

Open shaunren opened 1 year ago

shaunren commented 1 year ago

This is useful to determine e.g. the word boundaries in the output waveform.

orgarten commented 10 months ago

I am currently working on this and found the following things:

My current implementation would output alignment data for sentences in CSV:

timestamp, word, start_index