kiwix / kiwix-js-pwa

Kiwix JS Offline Browser implemented as a Progressive Web App (PWA), and packaged as Electron, NWJS and UWP apps for Windows and Linux.
https://pwa.kiwix.org
GNU General Public License v3.0
168 stars 26 forks source link

Wiktionary preview should show all text #600

Open Jaifroid opened 2 months ago

Jaifroid commented 2 months ago

As Wiktionary doesn't provide any metadata in the scraped HTML in order to identify definitions, and as we can't derive them heuristically because of multiple different languages, it would be better to gather, agnostically, all paragraphs, even if short, up to a certain number of characters. This needs testing because it can end up gathering lots of irrelevant stuff.