rhasspy / larynx

End to end text to speech system using gruut and onnx
MIT License
822 stars 48 forks source link

word by word timestamp or "boundary" event #33

Open nicolehe opened 2 years ago

nicolehe commented 2 years ago

Hi!

It would be great to have the ability to do something like print a word as it's being spoken, either with a word-by-word timestamp feature of an "onboundary" type of event like in the Web Speech API: https://developer.mozilla.org/en-US/docs/Web/API/SpeechSynthesisUtterance/onboundary

Thanks!

synesthesiam commented 2 years ago

This has been added in Larynx 1.0 via the <mark> SSML tag! It currently only works between sentences, however.

There are two ways to make use of it:

  1. Use --mark-file on the command-line to have the name of each mark printed as its encountered:
larynx -v en --ssml --mark-file /dev/stderr '<mark name="start" />This is a test.<mark name="end" />'

This will print "start" to standard error, say the sentence, then print "end".

  1. Programmatically from the results of the larynx.text_to_speech API. The TextToSpeechResult object (yielded for each sentence) contains a marks_before and marks_after list with the names of the marks that were encountered.