rdoeffinger / Dictionary

"QuickDic" offline Dictionary App for Android. Provided downloadable dictionaries are based on Wiktionaries but can also be created from other sources (see DictionaryPC). Remember to use --recursive when cloning! Fork of project that used to be hosted at code.google.com/p/quickdic-dictionary.
Apache License 2.0
313 stars 68 forks source link

Word pages look unformatted #129

Open dbogdanov opened 4 years ago

dbogdanov commented 4 years ago

The pages for each particular work look unformatted with lots of metadata tags output as raw text. Is this intentional? I've just installed the app and testing.

An example (EN.quickdic) rendered in QuickDic compared to the same Wiktionary page in Firefox Android:

drawing drawing

App version: 5.5.6

rdoeffinger commented 4 years ago

"Intentional" is the wrong word. The wiktionary data looks like on the left side, and there is no easy to use/integrate code to convert it to the right side. Support has been added for some specific, common ones. It would be possible to add support for some more, and for some others maybe just remove them (as they increase dictionary size without much benefit, for example online links are of somewhat questionable use in an offline dictionary). It would be some work though, and only improve things, not completely fix it.

Huy-Ngo commented 3 years ago

I suppose Wikimedia should have the parser for this markup. Maybe you can import them?

ilius commented 3 years ago

I have this problem as well in my Python tool: https://github.com/ilius/pyglossary/issues/48

I think using .zim files (from Kiwix project) is the easiest way to use Wiktionary or Wikipedia offline. There is libzim

shaked6540 commented 3 years ago

There actually is an easy way to extract the formatted data using https://github.com/tatuylonen/wiktextract

ilius commented 3 years ago

That tool simply downloads the rendered HTML from Wiktionary website one entry at a time. It does not render it. It's also in Python. This is a Java project.

shaked6540 commented 3 years ago

You use it to extract the information which you can then convert to the same format this dictionary is using, making it human readable. I'm using it in my app, there's no readme yet but you can compile and see for yourself how its much cleaner and readerable