ripose-jp / Memento

An mpv-based video player for studying Japanese
https://ripose-jp.github.io/Memento/
GNU General Public License v2.0
445 stars 21 forks source link

forvo audio support #75

Closed eyalmazuz closed 2 years ago

eyalmazuz commented 2 years ago

forvo hosts many native speakers audio in great quality, but it isn't so simple to have a single URL to extract

https://forvo.com/search/何で/ja

will bring you to the page where you can listen to the pronunciation of the word 何で

there if you inspect the blue-ish triangle play button image

you'll see the following JS onclick command onclick="Play(1903282,'OTA2MTQyOC83Ni85MDYxNDI4Xzc2XzIwOTg5MjNfMS5tcDM=','OTA2MTQyOC83Ni85MDYxNDI4Xzc2XzIwOTg5MjNfMS5vZ2c=',false,'ei9qL3pqXzkwNjE0MjhfNzZfMjA5ODkyM18xLm1wMw==','ei9qL3pqXzkwNjE0MjhfNzZfMjA5ODkyM18xLm9nZw==','h');return false;"

OTA2MTQyOC83Ni85MDYxNDI4Xzc2XzIwOTg5MjNfMS5tcDM=' and 'OTA2MTQyOC83Ni85MDYxNDI4Xzc2XzIwOTg5MjNfMS5vZ2c='

are base64 encode of the audio file and after you decode you'll get: 9061428/76/9061428_76_2098923_1.mp3

and you can access the audio file similarly to JapanesePod101 in the following link: https://audio00.forvo.com/mp3/9061428/76/9061428_76_2098923_1.mp3

another option is to implement the JSON audio source just like in yomichan image and then you can work with the yomichan forvo server addon https://ankiweb.net/shared/info/580654285

is there a way to implement any of the options in memento? I guess the second one is preferable

ripose-jp commented 2 years ago

It's not a trivial thing to add since to determine what audio options are available I'll have to hit the Forvo addon for each definition that's shown. There's also currently no way to change the audio source that's added to Anki from inside the popup dictionary which is something that would be required if I can't be sure what audio sources will be available until I hit the server. Basically I'm saying it's going to take awhile for me to implement all this.

In the mean time, here's a modification to the plugin that adds index and name query string options: __init__.py. The downloaded file is renamed to .txt just because GitHub doesn't allow the uploading of .py files.

So when you add the audio source to Memento, you can put http://localhost:8770/?expression={expression}&reading={reading}&index=0 and set the skip hash to 93b885adfe0da089cdf634904fd59f71. That will serve the first result the Forvo server brings up or the byte sequence that hashes to 93b885adfe0da089cdf634904fd59f71 if that index doesn't exist.

The name option works similarly http://localhost:8770/?expression={expression}&reading={reading}&name=jphonetics where it will serve the result from jphonetics if it exists, or nothing if it doesn't. You can add in the index option as well if you want to fall back to an index if nothing is found using the name option. http://localhost:8770/?expression={expression}&reading={reading}&name=jphonetics&index=0

If the name and index options are both absent the addon will work as normal, so there shouldn't be any problems continuing to use it with Yomichan.

Technically since https://github.com/jamesnicolas/yomichan-forvo-server has no license, this is copyright infringement, so if @jamesnicolas has an issue, he can let me know and I'll take this down.

eyalmazuz commented 2 years ago

I've might misunderstand but don't you only need to hit the server when you add a card to Anki/press the audio button?

I think yomichan does the exact same thing, only when injectAnkiNote is called on their backend.js code it run a chain that leads to downloadTermAudio method which then starts the whole process to getting the links from the json schema, you can see the process in this file (and I think it's okay to share it cause yomichan is GPL3) https://github.com/FooSoft/yomichan/blob/master/ext/js/media/audio-downloader.js

thanks for the workaround for now I really appreciate it, I'll use it for now

Calvin-Xu commented 2 years ago

I don't speak for the author, but I think there has been consideration about avoiding using stuff from Yomichan to avoid upgrading this project's license to GPLv3.

eyalmazuz commented 2 years ago

I don't speak for the author, but I think there has been consideration about avoiding using stuff from Yomichan to avoid upgrading this project's license to GPLv3.

maybe I worded myself incorrectly, but my previous message was more of a "yomichan has a free to use/modify license so here's their implementation of the JSON URL audio have a look, maybe it's possible to do something similar where you only access the JSON when adding the card" and not: "here's a solution, just copy it"

jamesnicolas commented 2 years ago

Technically since https://github.com/jamesnicolas/yomichan-forvo-server has no license, this is copyright infringement, so if @jamesnicolas has an issue, he can let me know and I'll take this down.

I don't mind!

ripose-jp commented 2 years ago

I've might misunderstand but don't you only need to hit the server when you add a card to Anki/press the audio button?

This is the case under the current system where one audio source can only link to one possible file. The Forvo addon requires scraping Forvo to find out how many audio clips are available. This isn't super fast from my testing and can take almost a second in an average case. Having a second delay between the user requesting what audio sources are available and actually showing them isn't super practical, so the best thing to do is pre-fetch everything and hope the user doesn't notice.

I haven't started implementing anything yet, so I'm just thinking about all the potential implementation challenges I'm going to have to deal with. Things can and often do change between brainstorming and actually working on the implementation.

(and I think it's okay to share it cause yomichan is GPL3)

I don't look at Yomichan code because Memento is under an incompatible license. I don't want any of the legal or ethical issues that might come from any potential code laundering.

Anyway, this stuff isn't anything you need to worry about beyond knowing this feature won't be implemented for a little while due to its complexity.

eyalmazuz commented 2 years ago

This is the case under the current system where one audio source can only link to one possible file. The Forvo addon requires scraping Forvo to find out how many audio clips are available. This isn't super fast from my testing and can take almost a second in an average case. Having a second delay between the user requesting what audio sources are available and actually showing them isn't super practical, so the best thing to do is pre-fetch everything and hope the user doesn't notice.

I understand now, that makes sense thanks for the explanation wouldn't it be better to just get the first audio source available or on the contrary write your own implementation to access forvo and then skip the overhead of having API calls to the addon? cause I assume pre-fetching everything every time the popup dictionary is loaded can have higher overhead depending on the number of terms I'll show which depends on the number of dictionaries a user has

I don't look at Yomichan code because Memento is under an incompatible license. I don't want any of the legal or ethical issues that might come from any potential code laundering.

I see, I'll not mention yomichan anymore to avoid future problems

Anyway, this stuff isn't anything you need to worry about beyond knowing this feature won't be implemented for a little while due to its complexity.

that's fine take your time, I wish I could've helped with the implementation but I'm not very knowledgeable with either CPP or qt

ripose-jp commented 2 years ago

wouldn't it be better to just get the first audio source available or on the contrary write your own implementation to access forvo and then skip the overhead of having API calls to the addon?

It would be better if I wanted to bake in a custom Forvo scrapper, but I think there's more benefit in supporting JSON audio sources like Yomichan does since that creates greater cross-compatibility between Memento and Yomichan.

cause I assume pre-fetching everything every time the popup dictionary is loaded can have higher overhead depending on the number of terms

The number of terms shown is capped at 10 by default for performance reasons (#1), so it's not really a function of how many dictionaries a user has unless they uncap the Result Limit in Search settings. At that point though, you're on your own.

Calvin-Xu commented 2 years ago

I think supporting additional (JSON) audio sources is a good measure. Anki runs when Memento is running anyways, so it's not like there's additional hassle in running the addon.

Scrapping Forvo is probably rate limited, although I have downloaded thousands at a time without issue spacing apart queries+downloads by 1 second. To be honest fetching for the first one might be good enough for 90% of the time if 10 turns out to be excessive. The add-on's vanilla speed in Yomichan feels good enough too, but I guess it has more visual feedback when loading.

ripose-jp commented 2 years ago

This feature was just added on master along with the ability to select the specific audio that will be added to an Anki card via right clicking the + button. Because of that last feature, I removed the ability to set the default audio source that would be added to cards from inside Anki Integration settings. Now the audio that will be added to cards by default is whatever has the highest priority among all the audio sources.

I shared Calvin's concerns about rate limiting, so I decided against pre-fetching audio sources. Instead when the play sound or add note buttons are right clicked, a context menu that says "Loading..." is be displayed while the audio sources are being fetched.

Calvin-Xu commented 2 years ago

Unless I missed something, it seems that selecting the audio source that will be added to Anki is not working for me. For example, looking up "呉呉も" yields some JapanesePod101 and Forvo sources.

image

JapanesePod101 actually does not have the audio, and indeed just clicking on the audio button gives an error.

image

It seems that clicking on any of the Forvo entries plays it. But afterwards adding the term to Anki there is no audio. Changing Forvo audio to the top audio source is a workaround.

ripose-jp commented 2 years ago

JapanesePod101 actually does not have the audio, and indeed just clicking on the audio button gives an error.

This is the intended behavior for file audio sources. To know if they exist requires fetching the audio and that's too expensive. If the user really cares, they can verify a file exists by trying to play the audio and seeing what happens.

It seems that clicking on any of the Forvo entries plays it. But afterwards adding the term to Anki there is no audio. Changing Forvo audio to the top audio source is a workaround.

I can't reproduce this happening. When I right click the add button and click a Forvo source, it gets added just fine. If your first priority audio source is JapanesePod101 (or any other file audio source) and the audio doesn't exist/matches the skip hash, AnkiConnect will just ignore it.

Calvin-Xu commented 2 years ago

I see. I missed that the idea is to right click the add button.

This is the intended behavior for file audio sources.

I see the consideration, but I feel like there could be some fallback behavior done afterwards. Suppose JapanesePod101 is the top audio source and does not have the term. It is my understanding that Memento knows that JapanesePod101 does not have the term by matching the hash for the "the audio for this clip is not available" fallback file. In this case, the audio has already been fetched and Memento knows that it is not available. Instead of silently adding no audio to Anki, perhaps it would be cool if Memento could try the next audio source?

I suppose this is just a general idea of fallbacks down the user's list of audio sources. The cost of this should be the same as the user checking and doing this themselves.

Calvin-Xu commented 2 years ago

I also wonder if the interface could be streamlined a little bit. Currently, right clicking on the add button selects which audio source is added, but the audio does not play either so there is no way of knowing whether the audio is suitable (kunyomi/onyomi differences and Forvo speakers can have different pitch), or whether JapanesePod101 has the audio, as an example. To know that, if I understand correctly, the user needs to right click on the audio button, listen to the sources play there, find a source, and then go to the add button to right click and add from the same source.

I feel like it could be a much simpler process if, right clicking on the audio button, a list of audio sources is displayed with a (single-select) radio button next to each. The top (default) source is selected by default. When the user clicks on an entry, the audio plays, their current selection changes (and perhaps the menu does not dismiss itself). Finally, when the user adds the term to Anki, the audio source used will be the currently selected one.

Since both audio playback and selection state are handled by the audio button right click menu, the add button does not need its own menu which I feel might be more intuitive.

ripose-jp commented 2 years ago

I'll look into how difficult it would be to redesign the system. If I do decide to revamp it, I'll probably just rip of Yomichan like always. It seems like it's more or less what you're asking for.