internetarchive / brozzler

brozzler - distributed browser-based web crawler
Apache License 2.0
653 stars 96 forks source link

Add option extract_outlinks_timeout #209

Closed vbanos closed 3 years ago

vbanos commented 3 years ago

Browser.extract_outlinks has a default timeout=60 parm that cannot be changed in any way. (It is always invoked using extract_outlinks().

We add param extract_outlinks_timeout=60 to BrozzlerWorker and Browser.browse_page to allow that.

vbanos commented 3 years ago

This feature is necessary for some users, e.g. in SPN2 we need to spend less time on extract_outlinks and we want to use a smaller timeout value.