Open AnyOldName3 opened 8 months ago
I determined that the particular site could in principle have worked with both the http://
/https://
equivalence and the www.domain
/domain
equivalence, but the system to detect when one redirected to another failed when it took more than one step. The http://
URLs redirected to the https://
URLs, which HTTrack handled sensibly, but then the non-www.
URLs redirected to the www.
ones, which HTTrack didn't bother fetching. I'm guessing that this was misinterpreted as a redirect loop as the URLs were the same after normalisation, but they were different before normalisation.
A site mirror I made didn't work properly until I commented out the guts of
jump_normalized_const
so it didn't jump over awww.
prefix, and then did work afterwards (although this is oversimplifying as I ended up needing to use https://github.com/mitchcapper/httrack so the project would build, and then had to patch out a couple of regressions it had versus this version).If the options to treat
http://
andhttps://
URLs as the same, treatwww.thing.com
andthing.com
URLs as the same, and to remove redundant slashes were separate instead of under one umbrella URL Hacks setting, I could have just enabled and disabled the bits I needed.