melink14 / rikaikun

rikaikun is a Chrome extension that helps you to read Japanese web pages by showing the reading and English definition of Japanese words when you hover over them.
https://chrome.google.com/webstore/detail/rikaikun/jipdnfibhldikgcjhfnomkfpcebammhp
GNU General Public License v3.0
411 stars 80 forks source link

Create modern, flexible system for handling arbitrary dictionaries #185

Open melink14 opened 3 years ago

melink14 commented 3 years ago

More than just updating the old dat and idx files I think it would be good to have a system with the following properties:

  1. Fast to load and query. Maybe indexeddb would fit this or another JS database.
  2. Able to process and load dictionaries from inside the app. This would allow: A. Easily supporting non english languages. B. Supporting niche dictionaries like computer terms or J-J C. Letting people update even when I was slow.

I wouldn't want the system to require the user to update by themselves though so perhaps I would want an autoupdating feature for common sources.

One thing to think about when defining the goals is what types of looks up we may want.

melink14 commented 3 years ago

Various leads: encoding converter which could convert utf-8: https://github.com/ashtuchkin/iconv-lite

chrome download API for serializing dictionaries. (Might not be useful to serialize outside of extension) https://developer.chrome.com/extensions/downloads#method-download

Utility for import-export of indexeddb in case we need to prepare it inadvance. https://www.npmjs.com/package/indexeddb-export-import

Dexie for making it easier to work with indexeddb https://github.com/dfahlander/Dexie.js

indexeddb docs https://developers.google.com/web/ilt/pwa/working-with-indexeddb

3rd party JS database which hasn't been developed in years but might be better than indexeddb https://github.com/louischatriot/nedb

Plain dictionary? https://stackoverflow.com/questions/10017808/best-data-structure-for-implementing-a-dictionary#:~:text=An%20alternative%20compact%20representation%20is,excellent%20data%20structure%20to%20consider.

Dictionary information

http://nihongo.monash.edu/wwwjdicinf.html#dicfil_tag http://www.edrdg.org/jmdict/edict_doc.html#IREF04 https://www.edrdg.org/jmdictdb/cgi-bin/entr.py?svc=jmdict&sid=&disp=ed&e=2058990

melink14 commented 2 years ago

Epwing format specifically is filed at #146

melink14 commented 2 years ago

Dixie specific import/export: https://www.npmjs.com/package/dexie-export-import

melink14 commented 2 years ago

Yomichan format is in indexedDB so supporting that format directly might be the way to go.

melink14 commented 2 years ago

https://github.com/FooSoft/yomichan-import

melink14 commented 2 years ago

Yomichan format is published as json schemas: https://github.com/FooSoft/yomichan/tree/master/ext/data/schemas Can use https://www.npmjs.com/package/json-schema-to-typescript to generate types for the json for easier use though might not be useful if just immediately saving them to indexDB.

Though perhaps the schemas can also be used to generate types for Dixie type safe Indexeddb access.

To generate dictionaries I can use https://github.com/actions/setup-go to build latest yomichan and then do something similar to https://github.com/FooSoft/yomichan-import/blob/master/scripts/build_dicts.sh in order to build dictionaries.

json validator: https://ajv.js.org/guide/getting-started.html