workeffortwaste / horseman

The detailed update and issue repository for the Horseman crawler.
https://gethorseman.app/
16 stars 0 forks source link

Not crawling URLs with long lived connections #52

Closed chrishaensel closed 1 year ago

chrishaensel commented 2 years ago

For me, horseman isn't crawling when using a HTTP URL (not https) as a start URL. Sample URL: http://www.insecam.org

workeffortwaste commented 2 years ago

This appears to be not due to http, but rather because the page has long lived network requests that don't complete. Currently Horseman relies on networkidle0, which never happens with that site.

The solution will be for me to add settings to allow you to specify the desired waitUntil options.

workeffortwaste commented 1 year ago

Fixed in 0.2.0 by allowing the user to pick the waituntil option.