readbeyond / aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
http://www.readbeyond.it/aeneas/
GNU Affero General Public License v3.0
2.49k stars 229 forks source link

Improvements to cloud TTS wrappers #130

Open readbeyond opened 7 years ago

readbeyond commented 7 years ago

Several improvements can be made:

  1. The Nuance and the AWS wrappers share some code => move it inside a new base class for cloud TTS services.
  2. One might want to create a "permanent cache on disk" of all the synthesized fragments, so that new invocations of cloud TTS wrappers with the same text fragment read from the permanent cache rather than synthesizing again.
  3. Let the user decide whether the synthesized audio should be downloaded in PCM or in compressed (MP3/OGG) format. In the latter case, though, each data file must be converted. (Not sure this is a good idea, although it might save some network traffic.)
pettarin commented 7 years ago

Note: as pointed out by one user, having a "permanent cache" would help solving the problems with TTS services failing (e.g., due to network problems/latency), invalidating the whole cache at once.

readbeyond commented 7 years ago

Added label "bug" since e.g. the cache-on-disk is a "borderline bug".

readbeyond commented 7 years ago

Note: we should store the cache on disk "per TTS". We also need to marshal (from/to disk) the dictionary containing (key=text, value=filename) pairs. "text" here is the actual text fragment (upper/lower case sensitive).