Add Coqui TTS functionality

jhudsl / ari

:dancers: The Automated R Instructor

https://jhudatascience.org/ari/

Other

146 stars 37 forks source link

Add Coqui TTS functionality #45

Closed howardbaek closed 1 year ago

howardbaek commented 1 year ago

Since ari doesn't play well when Coqui TTS is output as mp3, I made sure to set the output_format of text2speech::tts() to be wav instead of mp3.
Changed Remotes field in DESCRIPTION to jhudsl/text2speech

howardbaek commented 1 year ago

Responding to @cansavvy,

I was running through the source code of ari_spin(), which creates Wave objects in these lines. Then, it calls ari_stitch(), which calls writeWave which writes WAV audio files when you input Wave objects. If I create these Wave objects with coqui when output format is mp3, then write WAV audio files, I get a broken,scratchy radio sound. But, if I create these Wave objects with coqui when output format is WAV, I get normal sounding audio.

I asked on Coqui's Discord about the coqui's output format. The response that I got is the native output option with coqui is WAV and if I choose to output with MP3, I will get a file with an mp3 extension that is wav-encoded. So, they recommended I make the WAV file with coqui.

howardbaek commented 1 year ago

@cansavvy I've just added a warning message to ari_spin() and ran devtools::document(). This changed the .Rd files of other functions, but these changes are all stylistic changes (spacing, arrows as assignment operators, etc)

cansavvy commented 1 year ago

I was running through the source code of ari_spin(), which creates Wave objects in these lines. Then, it calls ari_stitch(), which calls writeWave which writes WAV audio files when you input Wave objects. If I create these Wave objects with coqui when output format is mp3, then write WAV audio files, I get a broken,scratchy radio sound. But, if I create these Wave objects with coqui when output format is WAV, I get normal sounding audio.

Very basic question: Does ari only work with wav objects for the other text2speech services as well?

howardbaek commented 1 year ago

I was running through the source code of ari_spin(), which creates Wave objects in these lines. Then, it calls ari_stitch(), which calls writeWave which writes WAV audio files when you input Wave objects. If I create these Wave objects with coqui when output format is mp3, then write WAV audio files, I get a broken,scratchy radio sound. But, if I create these Wave objects with coqui when output format is WAV, I get normal sounding audio.

Very basic question: Does ari only work with wav objects for the other text2speech services as well?

I tested running ari with WAV as an output format and everything ran smoothly for Amazon/Google.

I lost my free trial access to Microsoft Azure, so haven't been able to test Microsoft.

cansavvy commented 1 year ago

We should look into that dev Ubuntu fail and try to fix it but otherwise this seems good to me.