scrapinghub / spidermon

Scrapy Extension for monitoring spiders execution.
https://spidermon.readthedocs.io
BSD 3-Clause "New" or "Revised" License
534 stars 98 forks source link

Unable to find scrapy.cfg file to infer project data dir #200

Open sulthonzh opened 5 years ago

sulthonzh commented 5 years ago

image i got error message like this when deployed scrapy project to scrapyd, even when scrapy.cfg is included in the egg file

I have deployed a scrapy project to scrapyd, but I think there is a problem with the spidermon, because without scrapyd it's fine

rvandam commented 11 months ago

Ran into this problem as well and wanted to document my findings.

This appears to be due to a series of flawed assumptions between spidermon, scrapy and scrapyd. Spidermon's LocalStorageStatsHistoryCollector uses the data_path method from scrapy.utils.project to try to create a path to store stats history. But data_path requires you to have a scrapy.cfg file somewhere in your working directory or higher. But if you deploy via scrapyd-deploy then your local scrapy.cfg is never copied to the server (not even inside the deployed egg file). And so then scrapy barfs and spidermon doesn't gracefully handle it and kills your spider (see screenshot above).

Only workaround I've found is to add a dummy scrapy.cfg into your working directory (kudos to a suggestion in a related scrapy issue from 8 years ago https://github.com/scrapy/scrapy/pull/1581#issuecomment-154273384 ).

If you want the stats history to be stored somewhere else it appears you can use the completely undocumented datadir section in your otherwise dummy scrapy.cfg (the one on your server, not the one in your project which doesn't get deployed).

[datadir]
default = /path/to/somewhere/

You might alternatively be able to deploy your project's scrapy.cfg by modifying the setup.py that scrapyd-deploy generates. I have not tried that approach.

Perhaps spidermon should use a different, less obscure mechanism for choosing a data path? or at the very least degrade more gracefully by disabling stats history and logging it.