Open benoit74 opened 4 days ago
This is blocked by the fact that Sites.xml
is not provided anymore in the datadumps.
I've opened an issue upstream: https://meta.stackexchange.com/questions/404002/sites-xml-is-not-present-anymore-in-stackexchange-data-dumps
I've pushed current changes (not complete at all, but already capable to retrieve the most recent dump) to https://github.com/openzim/zimfarm/tree/new_dumps_url ; waiting for SE answer on upstream issue
Due to changes in StackExchange processes, our watcher does not grab new dumps properly.
Originally, all dumps where pushed to https://archive.org/details/stackexchange
Latest dumps have dedicated URLs now:
Looks like we can search for these new identifiers with this URL: https://archive.org/services/search/beta/page_production/?user_query=subject:%22Stack%20Exchange%20Data%20Dump%22%20creator:%22Stack%20Exchange,%20Inc.%22&hits_per_page=1&page=1&sort=date:desc&aggregations=false&client_url=https://archive.org/search?query=subject%3A%22Stack+Exchange+Data+Dump%22+creator%3A%22Stack+Exchange%2C+Inc.%22