infinilabs / crawler

🕷️ An easy-to-use spider written in Golang. (previous named GOPA.)
Other
307 stars 82 forks source link

Limit Hosts crawled #37

Open Vaccam opened 6 years ago

Vaccam commented 6 years ago

I running a crawl on my companies intranet. Is there a way to limit what hosts it crawls. It seems to be crawling every connection on our intranet. crawlhosts

kenkenchow commented 6 years ago

You may add url_match_rule for url_filter in gopa.yml

Vaccam commented 6 years ago

Thank you for your response. Once I have added this to the yml file and stop and start the server, the crawl continues to crawl the other hosts. Do I need to do something else to get the existing crawl to stop.

Thanks,

Michael

Vaccam commented 6 years ago

Is this what I want, as an example:

yml

kenkenchow commented 6 years ago

host_match_rule: must: prefix: [] contain: [url.that.iwant]

I put it as array format and it works Please check if gopa is still running after you stopped. If its still running, kill -9 pid.