scrapy / scrapyd

A service daemon to run Scrapy spiders
https://scrapyd.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
2.97k stars 569 forks source link

More details in the Jobs table #39

Closed Blender3D closed 2 years ago

Blender3D commented 10 years ago

The Jobs table in the web interface is really bare. The Scrapy stats collector contains a lot of valuable data, which should be included in this table.

I see a few ways of accessing this data:

  1. Parsing logs. This seems like unnecessary work and will only give access to crawl statistics after a crawl has finished.
  2. Subclassing CrawlerProcess, overriding methods that start/stop the reactor, thus removing the need to launch scrapyd.runner as a separate process. This gives us direct access to crawler.stats.get_stats() and gives the added benefit of using only one reactor to run multiple crawls.
  3. Using scrapy.contrib.webservice.stats.StatsResource. This doesn't rely on an unstable API (unlike 2), but will force us to parse log files to determine the webservice port.

Scrapyd needs some useful upgrades aside from a prettier UI. Scheduling periodic crawls, queues, retrying, etc. They don't seem difficult to implement, but I don't have the time to do this myself and don't know if the community even has interest in Scrapyd.

Thoughts?

Digenis commented 10 years ago

I wouldn't favour parsing logs, at least not by default. They may spawn big files and delay the rendering of the job table. The builtin provides enough functionality already but I understand the need for more features, I myself once ended up patching it locally because I could find any documentation on custom resource classes. Maybe more extensive documentation on resource classes and a contrib/ package would unleash the creativity of even more users and their useful ideas without cluttering the builtins. I think the community has interest in scrapyd, it just takes much more to get involved without detailed documentation to start from (compared to scrapy's doc)

jayzeng commented 10 years ago

I do agree scrapyd needs more powerful features for different needs, but adding more features adds unnecessary overheads for those who needs the absolute minimal. I think we need to think about adding plugins and expose/manage them through a settings file, and/or web ui.

jpmckinney commented 2 years ago

Closing as this feature request has not attracted additional interest since 2014.