johnfactotum / quick-lookup

Simple GTK dictionary application powered by Wiktionary
GNU General Public License v3.0
94 stars 8 forks source link

Use offline? #33

Open cassidyjames opened 2 years ago

cassidyjames commented 2 years ago

Quick Lookup currently required an Internet connection to look words up on Wiktionary. However, it would greatly expand the utility of Quick Lookup if it were usable offline. For example, an offline dictionary app would be great on Endless OS for schools with limited or no Internet connectivity.

johnfactotum commented 2 years ago

As stated in the readme, I don't plan on adding support for other dictionaries (online or offline). The app is deliberately kept as simple as possible (currently a single script with ~600 LOC) for a very simple and narrow use case.

That said, I guess it would make sense to add some support for offline dictd and StarDict dictionaries in the same way Foliate does now. After all, Quick Lookup is basically a spin-off of Foliate's dictionary feature.

da2x commented 2 years ago

Wiktionary can be used offline. Let’s see what would be required.

  1. Select a language.
  2. Prompt user for their preference whether to make online lookups or offline. Warn that offline requires a large download (~ 1 GiB) and installation size (~7 GiB).
  3. Fetch, uncompress, and store https://dumps.wikimedia.org/${language_code}wiktionary/latest/${language_code}wiktionary-latest-pages-articles.xml.bz2
  4. The dictionary file is just one giant XML file. For performance reasons, it would need be preprocessed into something more useful. I believe that importing it into an SQLite database is probably the easiest option. That process could be time-consuming but it’s a one-time operation. At the end we’d have an indexed, searchable, and memory-efficient way to interact with the. The user should probably be prompted to update their dictionary once every 6 months. The user could still use the application in online-mode while offline mode is prepared asynchronously.
  5. Query the SQLite database by title instead of the online search API.

It’s quite a bit of work, but doable.

johnfactotum commented 1 year ago

Apart from the size, one major problem is that the data is in wikitext, so it's very far from using the definition API. It would require a wikitext parser and convertor. It seems there are also Enterprise HTML Dumps, but it would still require manually parsing the HTML, so it's not a direct replacement of the API. Whether it's wikitext or HTML, it would take a long time just to parse the dump.

I think the better approach would be to support StarDict and DICTD, like Foliate does. Then use something like https://github.com/BoboTiG/ebook-reader-dict to pre-generate selected mono- or bilingual dictionaries in those formats. Their en-en dictionary, for example, is only ~30 MB. That would be far better than downloading the whole dump.

BoboTiG commented 1 year ago

Note about https://github.com/BoboTiG/ebook-reader-dict, we publish StarDict, and DictFile, for a few months now. They are generated every day alongside the Kobo DictHTML one. Have a look at the English dictionaries for example: https://github.com/BoboTiG/ebook-reader-dict/blob/master/docs/en/README.md.

Wee also propose etymology-free versions, which are smaller in size.