openzim / warc2zim

Command line tool to convert a file in the WARC format to a file in the ZIM format
https://pypi.org/project/warc2zim/
GNU General Public License v3.0
41 stars 5 forks source link

Redirection loops are still conducting to dead loops #281

Closed benoit74 closed 1 month ago

benoit74 commented 1 month ago

Redirections loops like A -> B -> A are now correctly detected by changes in #https://github.com/openzim/warc2zim/pull/278

Some redirections seems to still cause dead loops.

Analysis still has to be finished, but it is sure that current logic fails to detected redirections loops like A -> B -> C -> B (it only stops if we go back to original item)

benoit74 commented 1 month ago

Problem is https://www.bbc.com/persian/yourpics/ which redirects to https://www.bbc.co.uk/persian/yourpics/, and on .co.uk we have a real loop (still present on live server)