Implement good offline TTS (e.g., `coqui_ai/TTS`)

Currently, the only offline TTS is pyttsx3, which sounds horribly robotic on Ubuntu. coqui_ai/TTS (available on PyPi) has multiple models available that, when downloaded, will work offline. After experimenting with a couple and playing them for a CLV staff member, "tts_models/en/ljspeech/fast_pitch" seems to be promising. It's not quite as good as gtts, but is much better than pyttsx3. And it has sub-0.5s synthesis time.

This issue focuses on implementing that TTS engine, so that we actually have a reasonable-quality offline TTS engine we can switch to if circumstances require (e.g., internet dies right before a study).

hello-robot / stretch_web_teleop

Implement good offline TTS (e.g., `coqui_ai/TTS`) #93