mcthulhu / jorkens

epub reader based on epub.js for foreign language learners
54 stars 6 forks source link

Few questions about usage #3

Open anatoly314 opened 4 years ago

anatoly314 commented 4 years ago

Hi, nice project. Actually I began to do something similar when I encounter your project and doubt whether continue now on my own or join forces with you. Can you please explain how do I suppose to translate words, isn't it bound to right click? Sorry, didn't look yet in code :/

Btw, to make lemmatization you can use this library: https://stanfordnlp.github.io/CoreNLP/ in this way it can be bundled in a single app.

mcthulhu commented 4 years ago

Thanks; if you'd like to contribute you're welcome to. The things I'm working on at the moment are importing Facebook MUSE dictionaries into the database, and keyword extraction (rake-js apparently works, but is painfully slow).

All operations work against selected text right now. Selecting a word will automatically trigger a lookup in the glossary database; if nothing is found, then a concordance search will be done next. The selection will be converted to a lemma first if one exists. Searching an online dictionary under the Dictionaries menu will also use the last selected text. I haven't set up any right-click menu yet (other than a default one that appears if you click "outside" the book viewer iframe - didn't realize that until just now). I did start to work on getting the word over which the mouse is hovering, as an alternative to selecting text with the mouse, but haven't gotten that to work right yet, other than getting the mouse coordinates.

If by translation you mean machine translation, Amazon Translate is the only option working right now.

I'm aware of Stanford CoreNLP but haven't started to work with it yet. It seems to support a lot fewer languages than TreeTagger does; it does cover Arabic, which I don't think TreeTagger does. In the past I've also used Helsinki FST packages for lemmatization, just for European languages, as well as a node.js French lemmatizer. What natural languages are you working with?

anatoly314 commented 4 years ago

I don't work with natural languages at all, my specialization right now is Full Stack, but I have done project in the past which takes a list of the English words and turns them into ANKI cards by using this plugin: https://ankiweb.net/shared/info/2055492159 Translation itself it takes from DSL dictionaries, this format is supported by StarDict and other open-source apps and there're many freely available dictionaries across the internet. You can see this repo https://github.com/anatoly314/data2anki and I wrote a small article in Russian (but google translatable) here https://habr.com/ru/post/454236/ The issue with this approach that it's cucumbersome and I myself stopped used it, so now on a way to do more user friendly solution.

You this link to make a translation. It's google translate and it looks like a lot of tools are using it unofficially and till it's a fair use google won't ban you:

https://translate.google.pn/translate_a/t?client=dict-chrome-ex&sl=auto&tl=en&q=Trust%20me.%20No%20collar%20bones%2C%20remember%3F&tbb=1&ie=UTF-8&oe=UTF-8

Google itself keeps silence about it: https://groups.google.com/g/google-translate-api/c/JiB_3-gEthw/m/rS03VI0iBgAJ?pli=1

May I advise you to add tool like ESLint? It will help you to keep code cleaner and more coherent.

mcthulhu commented 4 years ago

Thanks, I'll try those suggestions.