Alignment data should be exposed as one of the outputs

I am currently working on this and found the following things:

The word boundaries are not obtainable, because the sentences are synthesized as a whole
Synthesizing singular words and accumulating the length (including silent bytes) to get alignment data for individual words is possible but takes much longer and also is anything but accurate.
Synthesized sentences/words are of different length with each run.

My current implementation would output alignment data for sentences in CSV:

timestamp, word, start_index

rhasspy / piper