First off, thanks for this. Saved me a lot of digging.
Second, I think this should return an HtmlResponse object, rather than a TextResponse object.
When using the scrapy crawl spider, rather than just the base spider, we check to see if the response object is of type HtmlResponse, and if it is not, we quit out without following any links.
First off, thanks for this. Saved me a lot of digging.
Second, I think this should return an HtmlResponse object, rather than a TextResponse object.
When using the scrapy crawl spider, rather than just the base spider, we check to see if the response object is of type HtmlResponse, and if it is not, we quit out without following any links.
Specifically, this happens in _requests_to_follow in the CrawlSpider class here: https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/spiders/crawl.py
Looks like HtmlResponse is just a shell class that inherits from TextResponse, so I wouldn't think it would cause any issues to change it.