xiaoyifang / goldendict-ng

The Next Generation GoldenDict
https://xiaoyifang.github.io/goldendict-ng/
Other
1.45k stars 77 forks source link

[Feature] use the Azure TTS API? #1553

Open xiaoyifang opened 3 weeks ago

xiaoyifang commented 3 weeks ago

https://learn.microsoft.com/en-us/azure/ai-services/speech-service/get-started-text-to-speech?tabs=windows%2Cterminal&pivots=programming-language-cli#prerequisites

The Microsoft TTS has offered a very high quality audio ,maybe worth a try to implemented as a function.

Users can select the text and use right-click menu to pronounce the text with the above engine.

The implementation can be wrapped around the cli command or use the provided C++ SDK.

image

image

shenlebantongying commented 2 weeks ago

The existing forvo/lingualibre are essentially the same.

We could merge them into "Online TTS".


Related art: this popular Anki add-on that provides TTS from many serveries (Azure included.). https://ankiweb.net/shared/info/1436550454 (It has cringe AD but works ok. I used a few times in the fast.)

Maybe we can copy its UI.

The left side can select a service and add various related parameters.

xiaoyifang commented 2 weeks ago

The existing forvo/lingualibre are essentially the same.

not the same . forvo/lingualibre are used for word, it is displayed as a seperate dictionary.

Azure TTS can be used for article or sentences which means it can work across different dictionaries.

Maybe we can copy its UI.

We can keep the configuration at minimum . speed and pitch can even be left out. image

xiaoyifang commented 2 weeks ago

https://github.com/Vocab-Apps/anki-awesome-tts?tab=readme-ov-file https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/ai-services/speech-service/rest-text-to-speech.md

shenlebantongying commented 2 weeks ago

We can keep the configuration at minimum . speed and pitch can even be left out.

What I really mean is that we don't limit this feature to one specific service provider.

The implementation should allow adding new service providers easy 😅

Adding new parameters shouldn't be much harder because it is pretty much combining new query URLs.

image

xiaoyifang commented 2 weeks ago

Though a bit ambition at first. I have no rejection with this. :-)

shenlebantongying commented 2 weeks ago

After some investigation, I find this feature should not be implemented with the current dictionary.hh facilities.

Websites/Programs/TTS/Transliteration are inherently different from other local storage-based dictionaries.

It was a mistake to merge them into one. All implementations of those “dictionary but actually not” are messy AF. Websites/Programs/TTS/Transliteration are the afterthought of designing dictionary.hh.

  1. Having one single dedicated object that inherits nothing to handles this feature.
  2. plug it into the current "dictionary.hh" monstrosity.

I find doing 1. (aka write from scratch) is 10x simpler than 2.


Leaky abstraction in action:

For example, how to extend the properties of a dictionary with dictionary.hh? Instead of putting properties into the dictionary class, they all live in config.hh. Websites/Programs/TTS/Transliteration need extra properties, so we have these lines below.

dictionary.hh is abstract enough to have "toHTML" method but also concrete enough to have "dictionary files" that Websites/Programs/TTS/Transliteration don't have (so they all have to return empty.).

https://github.com/xiaoyifang/goldendict-ng/blob/6a91c6bde34c5d37f2c110a9a6c57b72cce3bb51/src/config.hh#L448-L824

xiaoyifang commented 2 weeks ago

After some investigation, I find this feature should not be implemented with the current dictionary.hh facilities.

I agree with this , Azure tts can be used across dictionaries and act on its own. It can be displayed as a single function(for example, in the right context menu).

shenlebantongying commented 2 weeks ago

Not sure about the experience. Azure tts's endpoint depends on region, a user needs to copy both endpoint and API key in a super condensed interface :sweat_smile:


Uses this hurl file https://hurl.dev/

POST {{endpoint}}/cognitiveservices/v1

Ocp-Apim-Subscription-Key: ${Your key here}
X-Microsoft-OutputFormat: ogg-48khz-16bit-mono-opus
Content-Type: application/ssml+xml
User-Agent: WhatEver
<speak version='1.0' xml:lang='en-US'>
    <voice name='en-US-LunaNeural'>
        {{sentence}}
    </voice>
</speak>

with

hurl ./voice.hurl --variable endpont="https://eastus.api.cognitive.microsoft.com/" --variable sentence="This is nice!"  --output nice.ogg

will yield an audio.

The {{endpoint}} is obtained from the screenshot. The voice name is needed from {{endpoint}}/cognitiveservices/voices/list


It seems all cloud TTS supports the same "SSML" thing

https://cloud.google.com/text-to-speech/docs/ssml https://learn.microsoft.com/azure/ai-services/speech-service/speech-synthesis-markup https://docs.aws.amazon.com/polly/latest/dg/ssml.html

xiaoyifang commented 2 weeks ago

a little ui improvement
POST {{endpoint}}/cognitiveservices/v1 can be POST https://{{region}}.api.cognitive.microsoft.com/cognitiveservices/v1

users can use a dropdown list to select the regions which have fixed values in advance

voices can also be provided with fixed values in advance.

shenlebantongying commented 1 week ago

I think can add this one directly under “Edit” menu instead of “Edit -> Dictionaries”

Most things on the right side are only "somewhat a dictionary". It is a mistake for the morphology and transliteration, they cannot even be shown as an article.


Furthermore, as a separate component, the config can also be separated into a new file beside config -> config_cloud_tts.xml.

The timing of saving/reading config of different components are not entirely the same.

For example, the saving of MainWindowGeometry needs to read/write at program shutting down, while the dictionaries doesn't, there is no point of putting them in a single config. The “ominous” commitdata is overused. Opening/closing the editdictionaries dialog needs initializing/mutating excessive states (like crash of Qt TTS will bring down the entire dialog.).

Putting it into somewhere separate also makes adding/removing the feature entirely easy, there is no need to add #if feature_x macros, there is no need to carefully think about how to plug new feature things with existing ones. It is not clear how to add a config option without jumping around, and reading everything in the past.

Everything related to one component in one place

vs

orgy of features


TTS dialog -> read/write config The TTS engine -> read config

(Side effect: this makes building this component as a separate program easy.)

xiaoyifang commented 1 week ago

Most things on the right side are only "somewhat a dictionary". It is a mistake for the morphology and transliteration, they cannot even be shown as an article.

move them to Edit->preference?

xiaoyifang commented 1 week ago

Furthermore, as a separate component, the config can also be separated into a new file beside config -> config_cloud_tts.xml.

This can be considered . azure tts can have its own config file.
It does not have to implemented the dictionary.hh

shenlebantongying commented 1 week ago

It is not really difficult to replicate AwesomeTTS for an audio preview pane 😅

Progress for today, a little app https://github.com/SourceReviver/temp_ctts_impl

xiaoyifang commented 1 week ago

Do you have time to implement this feature?

shenlebantongying commented 1 week ago

I think https://github.com/SourceReviver/temp_ctts_impl is complete for the initial version of this feature.

However, I need to prepare for an exam on Friday, so I will prepare an PR this weekends 😅

xiaoyifang commented 1 week ago

Exam first . PR can wait.