Closed rabol closed 8 months ago
Thanks for pointing this out, this is something I've been meaning to get around to since forever. I just released a new version which adds a HttpErrorMiddleware
that automatically drops responses outside the 200-300
range. This middleware is enabled by default if your spider extends BasicSpider
(provided you're not overriding the $downloaderMiddleware
property).
Here are the relevant docs which also explain the available configuration options: https://roach-php.dev/docs/downloader-middleware#handling-http-errors
Thanks:
small note: the stub that is used when one create a Spider, should not 'override' the BaseSpider properties by default.
e.g.
in my case I now have to double check all spiders because the BasicSpider now have this:
public array $downloaderMiddleware = [
RequestDeduplicationMiddleware::class,
HttpErrorMiddleware::class,
];
and my Spider created by roach have this:
public array $downloaderMiddleware = [
RequestDeduplicationMiddleware::class,
];
Describe the bug The parse() method of a spider is called even if a wrong url is passed in and the response result code is 404.
Expected behavior that the pasing is stopped in case of not status 200.
Package versions (please complete the following information):
roach-php/laravel v3.1.0
Additional context Spider class:
started like this:
this will give you a
The current node list is empty.
error