mmp / vice

Virtual air traffic control simulator
https://pharr.org/vice
GNU General Public License v3.0
18 stars 25 forks source link

Add Text To Speech (TTS) #221

Open svalencia014 opened 1 month ago

svalencia014 commented 1 month ago

Opening this issue for me to keep track of research and for discussion of solutions. Would help lead to adding voice recognition in the future.

mmp commented 1 month ago

I have a branch with a WIP attempt at this. The general challenge is that the best TTS solutions these days are all based on neural nets and it's challenging to package them up as part of vice since it has to support lots of different GPUs, operating systems, etc.

So, I've been trying an approach with pre-generated pilot speech stored as MP3 files that ship with vice. There is a tradeoff in that the TTS models give better results with, say "descend and maintain 3000, Jet Blue twelve thirty four" as one synthesized thing versus, for example, "descend and maintain 3000", "Jet Blue", "twelve", "thirty-four"; it just flows together better if it's all done at once, but on the other hand it's impractical to pre-synthesize every possible readback ahead of time.

I'm trying to strike a balance between readbacks that sound good and having a variety of voices versus not having a massive download from all of the mp3s. (And then also fixing all of the bugs, finishing it up, etc...)

svalencia014 commented 1 month ago

Most of the libraries I've found for go unfortunately rely on having internet to use things like the google translate API or google cloud's TTS engine. I don't think there would be an easy way to go about it without having multiple MP3 files like you suggested, or trying to find an engine that is able to keep the binary small while still working great

mmp commented 1 month ago

Yeah, the cloud services are another option. My thinking is that for those, a) each query costs money, which isn't great in this case, and b) there's the issue of the delay between issuing the request and getting the synthesized speech back. I'm worried that it could take too long to get a response for it to be a good experience.