Closed audiodude closed 1 week ago
It seems clear this is the same issue as #2071. Perhaps close this and generalize the title of that?
It's the opposite problem actually, the version scraped with 1.14 is half the size (smaller).
The first step in analyzing this would be to do the "apples to apples" and scrape the wiki as it is now with 1.13 versus 1.14.
Here's the results of scraping the current wiki with 1.13 and 1.14:
14M output/wikipedia_bm_all_maxi_2024-08.113.zim
22M output/wikipedia_bm_all_maxi_2024-08.114.zim
It is clear there were major structural changes between June and July that cause the most recent scrapes to be smaller.
In the end, it turns out this is in fact the same issue as #2071. Closing as duplicate.
The ZIM that was scraped in July 2024 by 1.14 for bm_all_maxi is about half the size of the one for June, scraped by 1.13:
We've started looking at the ZIMs and there is definitely a disparity in image resolution. Many of the images in the July ZIM have much smaller dimensions.
This could have been caused by clearing the image cache between runs. If 1.14 didn't find the image in the cache, it may have resorted to either: