internetarchive / brozzler

brozzler - distributed browser-based web crawler
Apache License 2.0
653 stars 96 forks source link

Skip invalid outlink #223

Closed vbanos closed 3 years ago

vbanos commented 3 years ago

When one of the outlinks is http://-1/ urlcanon.whatwg raises an unhandled exception ipaddress.AddressValueError and the capture fails.

We can skip the problematic outlink and keep the rest without crashing.

vbanos commented 3 years ago

You can replicate this if you try to capture http://musicmachinery.com/2009/04/27/moot-wins-time-inc-loses/