rhasspy / larynx

End to end text to speech system using gruut and onnx
MIT License
822 stars 49 forks source link

MaryTTS API interface is not 100% compatible #35

Closed fquirin closed 2 years ago

fquirin commented 2 years ago

Hi Michael,

congratulations for your Larynx v1.0 release :partying_face: . Great work, as usual :slightly_smiling_face:.

I've been trying to use Larynx with the new SEPIA v0.24.0 client since it has an option now to use MaryTTS compatible TTS systems directly, but encountered some issues:

The last point is not really a MaryTTS compatibility issue, but it would be great to get each voice as 'low, medium, high' variation from the 'voices' endpoint, so the user could actually choose them from the list.

I believe the Larynx MaryTTS endpoints are mostly for Home-Assistant support and I'm not sure how HA is parsing the voices list (maybe it doesn't parse it at all or just uses the whole string), but it would be great to get the original format from the /voices endpoint. Would you be willing to make these changes? :innocent: :grin:

synesthesiam commented 2 years ago

Thanks! This sounds like a great idea -- I'll get it implemented tomorrow. I like the idea of having the quality come through too, so the voices will show up like "harvard;low", "harvard;high" since those are the values you can plug directly into /process

Also, I could use a few tips in gruut on how to properly parse more stuff in German. You can see all the good stuff I have for English. I'm using the dateparser library in Python, but oddly it didn't parse "YYYY.MM.DD" as a German date...

Oh, and how is the STT server going? I'm planning to cycle back around to that soon and was curious where you were at.

fquirin commented 2 years ago

Thanks! This sounds like a great idea -- I'll get it implemented tomorrow.

Awesome :star_struck:

I'm using the dateparser library in Python, but oddly it didn't parse "YYYY.MM.DD" as a German date...

Ok that's really weird :-/. I'm not familiar with 'dateparser' but this seems like a trivial task :sweat_smile: , on the other hand nothing about time and dates is trivial when you want to support more than 1 language :laughing: . I don't see 'dateparser' in the code you've linked above, where is it applied to the text?

I could certainly write some parsing methods for German, but keep in mind that methods like en_verbalize_time(time) won't work for German because pronunciation depends on the surrounding words :see_no_evil: . It will require something like de_verbalize_time(full_text) :no_mouth: . I've been working on 'text2num' German support a few weeks ago (its part of the ne STT server :-)) and it was very complicated to implement properly because the old code structure didn't really take into account that German behaves fundamentally different in some situations :-/. I remember we've started to discuss the topic a while ago, maybe we can continue there.

Oh, and how is the STT server going? I'm planning to cycle back around to that soon and was curious where you were at.

It has been released and I'm very happy with the results so far: SEPIA STT-Server, you can use one of the Docker containers for testing :slightly_smiling_face: . I've written a Javascript client for the server and was planning to document the API next. Let me know if you need any info.

synesthesiam commented 2 years ago

Docker image has been updated with the fixed MaryTTS API :+1: Let me know if there are any more issues. I forgot to mention too that you can send SSML into the MaryTTS API; if the text begins with an angle bracket < it will be interpreted as SSML.

I'll continue the parsing discussion over on the gruut thread. It should be possible now with gruut to add models for predicting more features of German words (e.g., case), and use them when verbalizing dates, etc.

I'll check out the SEPIA STT-Server! I'd like to use it as a base for a project like OpenTTS for STT, where all of the available models for a given language are gathered together behind a web API.

fquirin commented 2 years ago

Docker image has been updated with the fixed MaryTTS API +1 Let me know if there are any more issues.

The interface works like charm now :smiley:

There are 2 things though that confuse me. First one is that the qualities low and medium are basically same speed :thinking: . That is something I've noticed before in an earlier version. I've been testing the aarch64 container on RPi4 and the results look like this (Voice: Mary-Ann en_us):

Load-times are measured inside the SEPIA client and Larynx test-page and are identical to Larynx console plus ~100ms network delay.

Second thing is that after almost exactly 30s of inactivity it takes another 2s to load the result every time, no matter how long the text is :-|. It looks like something is unloading or powering down, because these 2s are not showing up in the Larynx console. This is reproducible inside SEPIA client and the Larynx test-page and I'm wondering if it has something to do with Docker itself because loading the Larynx test-page will reset the timer as well :confused: . I need to double-check if the MaryTTS container (or any container) has the same effect :thinking:

It should be possible now with gruut to add models for predicting more features of German words (e.g., case), and use them when verbalizing dates, etc.

If you show me the right place and methods I can try to fill them with life for German ;-)

I'll check out the SEPIA STT-Server! I'd like to use it as a base for a project like OpenTTS for STT, where all of the available models for a given language are gathered together behind a web API.

I'd definitely support that. I was planning to add Coqui to the server but haven't had time yet. Everything is prepared to handle different "engines" though :-)

fquirin commented 2 years ago

Second thing is that after almost exactly 30s of inactivity it takes another 2s to load the result every time, no matter how long the text is :-|.

I solved this one :sweat_smile: . It seems to be a problem with domain name resolution inside my network O_o. If I use the IP instead of the RPi hostname I never get the 2s timeout :-/

synesthesiam commented 2 years ago

Awesome, thanks for letting me know! I couldn't think of a reason it would do that :slightly_smiling_face: