Open sulthonzh opened 5 years ago
Ran into this problem as well and wanted to document my findings.
This appears to be due to a series of flawed assumptions between spidermon, scrapy and scrapyd. Spidermon's LocalStorageStatsHistoryCollector
uses the data_path
method from scrapy.utils.project to try to create a path to store stats history. But data_path
requires you to have a scrapy.cfg
file somewhere in your working directory or higher. But if you deploy via scrapyd-deploy
then your local scrapy.cfg
is never copied to the server (not even inside the deployed egg file). And so then scrapy barfs and spidermon doesn't gracefully handle it and kills your spider (see screenshot above).
Only workaround I've found is to add a dummy scrapy.cfg into your working directory (kudos to a suggestion in a related scrapy issue from 8 years ago https://github.com/scrapy/scrapy/pull/1581#issuecomment-154273384 ).
If you want the stats history to be stored somewhere else it appears you can use the completely undocumented datadir
section in your otherwise dummy scrapy.cfg
(the one on your server, not the one in your project which doesn't get deployed).
[datadir]
default = /path/to/somewhere/
You might alternatively be able to deploy your project's scrapy.cfg
by modifying the setup.py
that scrapyd-deploy generates. I have not tried that approach.
Perhaps spidermon should use a different, less obscure mechanism for choosing a data path? or at the very least degrade more gracefully by disabling stats history and logging it.
i got error message like this when deployed scrapy project to scrapyd, even when scrapy.cfg is included in the egg file
I have deployed a scrapy project to scrapyd, but I think there is a problem with the spidermon, because without scrapyd it's fine