rhasspy / larynx

End to end text to speech system using gruut and onnx
MIT License
822 stars 49 forks source link

Improve performance with caching #69

Closed JeroenvdV closed 1 year ago

JeroenvdV commented 1 year ago

I hope to gain some understanding about how feasible and useful it is to cache certain (intermediate) outputs. If large parts of phrases are re-used often, couldn't they be cached (perhaps on multiple levels) to improve response time? And if so, the cache could be pre-populated with expected outputs by speaking them all once. For example a program that reads the time of day could have a cache for 'The time is' as well as numbers up to 59. The expected reduction in response time would depend on which parts of the process actually take the most time, which I'm not sure about.

synesthesiam commented 1 year ago

Larynx (and its successor Piper) operate at the level of sentences, since this is what they're trained on. You could try stitching together pieces of audio, but the intonation will likely sound off.