Closed galgeek closed 1 year ago
If anyone has chance, as someone who uses warcprox, I'd be very grateful for any information about the problems you've found that this change resolves.
If anyone has chance, as someone who uses warcprox, I'd be very grateful for any information about the problems you've found that this change resolves.
I think that this could help make it so that pages that don't need to get resaved are not resaved so that the internet archive can go to other pages that may need their saves updated.
If anyone has chance, as someone who uses warcprox, I'd be very grateful for any information about the problems you've found that this change resolves.
current warcprox code captures many revisit records for some sites / urls, adversely affecting capture in some cases, and replay in more.
note: this PR is likely to be replaced soon by a PR for warcprox/dedup.py
This seems like an important PR to get merged.