Closed benoit74 closed 1 month ago
Problem is around handling of redirections. Scraper stores only redirections from ZimPath to ZimPath, but it means in some cases we are storing an redirection to self (e.g. redirection from http://www.kiwix.org to https://www.kiwix.org is equivalent in terms of ZimPath). These redirections should simply be ignored since they are already considered equal in terms of ZimPath.
(and this was conducting to a dead loop)
Task: https://farm.openzim.org/pipeline/b7a1a162-671c-4cb5-bda9-8b4b917efbc2/debug
This is most probably a recent regression in Warc2zim 2
At 2024-05-11 08:07:42, warc2zim started
At 2024-05-11 08:08:36, it had collected the metadata
Then nothing more in the log for almost 48h, so I cancelled the task at 2024-05-13 06:27