Open xiaoyifang opened 3 weeks ago
The existing forvo/lingualibre are essentially the same.
We could merge them into "Online TTS".
Related art: this popular Anki add-on that provides TTS from many serveries (Azure included.). https://ankiweb.net/shared/info/1436550454 (It has cringe AD but works ok. I used a few times in the fast.)
Maybe we can copy its UI.
The left side can select a service and add various related parameters.
The existing forvo/lingualibre are essentially the same.
not the same .
forvo/lingualibre are used for word
, it is displayed as a seperate dictionary.
Azure TTS can be used for article or sentences
which means it can work across different dictionaries.
Maybe we can copy its UI.
We can keep the configuration at minimum . speed and pitch can even be left out.
We can keep the configuration at minimum . speed and pitch can even be left out.
What I really mean is that we don't limit this feature to one specific service provider.
The implementation should allow adding new service providers easy 😅
Adding new parameters shouldn't be much harder because it is pretty much combining new query URLs.
Though a bit ambition at first. I have no rejection with this. :-)
After some investigation, I find this feature should not be implemented with the current dictionary.hh
facilities.
Websites/Programs/TTS/Transliteration are inherently different from other local storage-based dictionaries.
It was a mistake to merge them into one. All implementations of those “dictionary but actually not” are messy AF. Websites/Programs/TTS/Transliteration are the afterthought of designing dictionary.hh
.
Having one single dedicated object that inherits nothing to handles this feature.
plug it into the current "dictionary.hh" monstrosity
.I find doing 1.
(aka write from scratch) is 10x simpler than 2.
Leaky abstraction in action:
For example, how to extend the properties of a dictionary with dictionary.hh
? Instead of putting properties into the dictionary class, they all live in config.hh
. Websites/Programs/TTS/Transliteration need extra properties, so we have these lines below.
dictionary.hh
is abstract enough to have "toHTML" method but also concrete enough to have "dictionary files" that Websites/Programs/TTS/Transliteration don't have (so they all have to return empty.).
After some investigation, I find this feature should not be implemented with the current
dictionary.hh
facilities.
I agree with this , Azure tts can be used across dictionaries and act on its own. It can be displayed as a single function(for example, in the right context menu).
Not sure about the experience. Azure tts's endpoint depends on region, a user needs to copy both endpoint and API key in a super condensed interface :sweat_smile:
Uses this hurl file https://hurl.dev/
POST {{endpoint}}/cognitiveservices/v1
Ocp-Apim-Subscription-Key: ${Your key here}
X-Microsoft-OutputFormat: ogg-48khz-16bit-mono-opus
Content-Type: application/ssml+xml
User-Agent: WhatEver
<speak version='1.0' xml:lang='en-US'>
<voice name='en-US-LunaNeural'>
{{sentence}}
</voice>
</speak>
with
hurl ./voice.hurl --variable endpont="https://eastus.api.cognitive.microsoft.com/" --variable sentence="This is nice!" --output nice.ogg
will yield an audio.
The {{endpoint}}
is obtained from the screenshot.
The voice name is needed from {{endpoint}}/cognitiveservices/voices/list
It seems all cloud TTS supports the same "SSML" thing
https://cloud.google.com/text-to-speech/docs/ssml https://learn.microsoft.com/azure/ai-services/speech-service/speech-synthesis-markup https://docs.aws.amazon.com/polly/latest/dg/ssml.html
a little ui improvement
POST {{endpoint}}/cognitiveservices/v1
can be
POST https://{{region}}.api.cognitive.microsoft.com/cognitiveservices/v1
users can use a dropdown list to select the regions which have fixed values in advance
voices
can also be provided with fixed values in advance.
I think can add this one directly under “Edit” menu instead of “Edit -> Dictionaries”
Most things on the right side are only "somewhat a dictionary". It is a mistake for the morphology and transliteration, they cannot even be shown as an article.
Furthermore, as a separate component, the config can also be separated into a new file beside config
-> config_cloud_tts.xml
.
The timing of saving/reading config of different components are not entirely the same.
For example, the saving of MainWindowGeometry needs to read/write at program shutting down, while the dictionaries doesn't, there is no point of putting them in a single config. The “ominous” commitdata is overused. Opening/closing the editdictionaries dialog needs initializing/mutating excessive states (like crash of Qt TTS will bring down the entire dialog.).
Putting it into somewhere separate also makes adding/removing the feature entirely easy, there is no need to add #if feature_x
macros, there is no need to carefully think about how to plug new feature things with existing ones. It is not clear how to add a config option without jumping around, and reading everything in the past.
Everything related to one component in one place
vs
orgy of features
TTS dialog -> read/write config The TTS engine -> read config
(Side effect: this makes building this component as a separate program easy.)
Most things on the right side are only "somewhat a dictionary". It is a mistake for the morphology and transliteration, they cannot even be shown as an article.
move them to Edit->preference?
Furthermore, as a separate component, the config can also be separated into a new file beside
config
->config_cloud_tts.xml
.
This can be considered . azure tts can have its own config file.
It does not have to implemented the dictionary.hh
It is not really difficult to replicate AwesomeTTS for an audio preview pane 😅
Progress for today, a little app https://github.com/SourceReviver/temp_ctts_impl
Do you have time to implement this feature?
I think https://github.com/SourceReviver/temp_ctts_impl is complete for the initial version of this feature.
However, I need to prepare for an exam on Friday, so I will prepare an PR this weekends 😅
Exam first . PR can wait.
https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-text-to-speech?tabs=windows%2Cterminal&pivots=programming-language-cli#prerequisites
The Microsoft TTS has offered a very high quality audio ,maybe worth a try to implemented as a function.
Users can select the text and use right-click menu to pronounce the text with the above engine.
The implementation can be wrapped around the cli command or use the provided C++ SDK.