Closed Blender3D closed 2 years ago
I wouldn't favour parsing logs, at least not by default. They may spawn big files and delay the rendering of the job table. The builtin provides enough functionality already but I understand the need for more features, I myself once ended up patching it locally because I could find any documentation on custom resource classes. Maybe more extensive documentation on resource classes and a contrib/ package would unleash the creativity of even more users and their useful ideas without cluttering the builtins. I think the community has interest in scrapyd, it just takes much more to get involved without detailed documentation to start from (compared to scrapy's doc)
I do agree scrapyd needs more powerful features for different needs, but adding more features adds unnecessary overheads for those who needs the absolute minimal. I think we need to think about adding plugins and expose/manage them through a settings file, and/or web ui.
Closing as this feature request has not attracted additional interest since 2014.
The Jobs table in the web interface is really bare. The Scrapy stats collector contains a lot of valuable data, which should be included in this table.
I see a few ways of accessing this data:
CrawlerProcess
, overriding methods that start/stop the reactor, thus removing the need to launchscrapyd.runner
as a separate process. This gives us direct access tocrawler.stats.get_stats()
and gives the added benefit of using only one reactor to run multiple crawls.scrapy.contrib.webservice.stats.StatsResource
. This doesn't rely on an unstable API (unlike 2), but will force us to parse log files to determine the webservice port.Scrapyd needs some useful upgrades aside from a prettier UI. Scheduling periodic crawls, queues, retrying, etc. They don't seem difficult to implement, but I don't have the time to do this myself and don't know if the community even has interest in Scrapyd.
Thoughts?