Open Moonbase59 opened 2 years ago
Wikimedia's Enterprise HTML Dumps are generated monthly and contain rendered HTML (vs wikicode) of all pages on Wiktionary, separated by language. This world make parsing and filtering wikicode unnecessary; all you'd need to do is either preprocess to remove things you don't want, but injecting a little CSS would probably be enough for that.
Wikimedia's Enterprise HTML Dumps
This looks interesting but it would/could probably take more space than wikicode...
From https://github.com/rdoeffinger/Dictionary/releases/tag/v0.3-oldformat, I downloaded the dictionaries
EN.quickdic
,DE.quickdic
andEN-DE.quickdic
and installed these on my Tolino Vision 5 (firmware 15.2.0).As you can see from the attached screenshots, a lookup of the word "character" in
EN.quickdic
produces 14 pages, badly formatted with lots of Wiktionary code stuff; a try to translate "character" into German usingEN-DE.quickdic
produces 31+ pages, partly more useful, but still containg lots of Wiktionary code.This bloats the output so much that your otherwise real nice dictionaries become nearly unusable, which is a shame.
Can I kindly suggest that you revise the Wiktionary code building process a little, to adapt for all this (unwanted/unneeded) extra information, in order to arrive at a more usable output again?
Let me know if you need more information, or how I could possibly help – thanks!
EN.quickdic-character-screenshots.zip
EN-DE.quickdic-character-screenshots.zip
P.S.: I also believe that "Hyphenation" (page 3/14 of "character" lookup) should be a separate subsection (like "Pronunciation"), and the parts of the word more likely displayed like "char·ac·ter".
P.P.S.: I think the output should look more like the Wiktionary page, leaving out the ToC and the non-English parts, probably even the audio player links for pronunciation. (Most e-readers either don’t have audio, might have no Internet connection, or simply won’t be able to switch to the browser and play audio, then return to the same page in the ebook.)