scrapy / scrapy

Scrapy, a fast high-level web crawling & scraping framework for Python.
https://scrapy.org
BSD 3-Clause "New" or "Revised" License
53.19k stars 10.57k forks source link

`FeedExporter` extension should throw when misconfigured #2089

Open rampage644 opened 8 years ago

rampage644 commented 8 years ago

FeedExporter extension should either throw NotConfigured when spider lacks some properties it should have according to FEED_URI. Right now in that case open_spider initialize fails causing all subsequent signal handlers to fail as well.

redapple commented 8 years ago

@rampage644 , do you have an example setup that fails? If feed exporter is not usable due to misconfiguration, I believe the spider should fail fast, not run, until the config is fixed, so as not to waste resources without producing output data (vs. throwing NotConfigured which is more subtle to spot in the logs)

rampage644 commented 8 years ago

@redapple Sure, i have :) Simplest sample is to define FEED_URI with some setting (let's say undefined_property) and not define it within spider itself. For instance, FEED_URU = s3://%(undefined_bucket)s/file.json. In case spider (or FEED_URI_PARAMS setting) lacks such settings all signal handlers would fail at runtime.

Proper (in my opinion) implementation is to check it during init and do not turn it on if misconfigured.

I just don't have any idea (for now) how to resolve it as open_spider isn't wrapped with try-catch block and it doesn't make any sense to throw it (otherwise, i'd submit PR and not an issue, it's just a reminder for myself to do it later).