Closed CTPAYC23 closed 7 years ago
Can you using a follow rule like: \d+(/$|$)
instead?
@ruairif , ok trying that now. Can I still combine follow and exclude rules successfully?
You can. It follows all links that match the follow rules but not the exclude rules
Hi,
I'm crawling a site with URLs in the format: http://website.com/product/brand/id123/
I'm trying to exclude links like: http://website.com/product/brand/id123/doc/ http://website.com/product/brand/id123/txt/
so configuring crawling exclusions. Tried: doc /doc/ \/doc\/ .*\/doc\/
but still seeing a lot of crawling attempts for http://website.com/product/brand/id123/doc/ pages in the request log of Scrapinghub. The ratio of scraped to requests is too low and the job gets eventually stopped.
What is the right way of excluding URLs like these please?