openaustralia / morph

Take the hassle out of web scraping
https://morph.io
GNU Affero General Public License v3.0
463 stars 74 forks source link

Provide API to let consumer know how up to date everything is #136

Open mlandauer opened 10 years ago

mlandauer commented 10 years ago

This could be some kind of combination of when

hailspuds commented 10 years ago

I have more scrapers than the average user (thanks guys!), and I hit up the API to check the results of each scraper. However, there's no way to tell if the scraper has run - so there are likely many unnecessary API calls.

It'd be good if the API returned a timestamp of the next time the scraper is due to run. That way I could store the timestamp, and only check the morph.io results when the scraper has run.

mlandauer commented 10 years ago

Currently once per day the system grabs all the scrapers that are set to run automatically, shuffles them randomly and then queues them up to run at intervals during the next 24 hours.

So, in general we can't know when a scraper is going to next run. Also the scraper could be run manually at any time.

What we could do instead (as is part of the original issue) is add an api for finding out when the scraper was last run.

A different thing we could is make calls on a specific api more efficient by using etags. The api client would do a HEAD request and if the etag hasn't changed wouldn't need to do a full request.

@hailspuds do you have a preference between those two approaches or any other ideas on how we could address this?