kiwix / overview

https://kiwix.org
88 stars 14 forks source link

Scraping of German Wiktionary not working correctly (missing photos & translations) #64

Closed ghost closed 2 years ago

ghost commented 2 years ago

The German Wiktionary is not being scraped properly. The code to scrape the German Wiktionary fails to obtain many photos and translations.

For example, the .zim version from February 2022 shows the headword "schütten" without any photo:

image

However, the online version shows a photo that was added to the Wiki on November 3, 2021:

image

image

Therefore, the "February" .zim did not include a photo that was already there. I tried the Kiwix App on Android and GoldenDict. The image is simply missing. That issue is relatively common in the German Wiktionary. The same happens with the Translations Box.

It seems that the code to scrape the German Wiktionary needs to be updated/improved.

Thanks for your hard work ! :D

kelson42 commented 2 years ago

@GPLv3-fan I confirm indeed http://library.kiwix.org/wiktionary_de_all_maxi/A/sch%C3%BCtten

ghost commented 2 years ago

@kelson42 I am not sure about how to report his bug properly to Wikimedia Headquarters. I have doubts about the proper category and the technical details (I am not a programmer). Could you please kindly report the bug with technical information to Wikimedia ?

I collected several links with the issue of lacking images on .zim files that are present on the German Wiktionary: https://de.wiktionary.org/wiki/Fensterbrett https://de.wiktionary.org/wiki/schütten https://de.wiktionary.org/wiki/riechen https://de.wiktionary.org/wiki/parken https://de.wiktionary.org/wiki/ernten

3 photos on Wiktionary and 2 missing: https://de.wiktionary.org/wiki/balancieren