socrata / datasync

Desktop / Console application for updating Socrata datasets automatically.
http://socrata.github.io/datasync/
MIT License
81 stars 33 forks source link

Handing frequent replace jobs #94

Open levyj opened 9 years ago

levyj commented 9 years ago

Feature suggestion - On occasion, when the system bogs down, the next iteration of a replace job can still be running when the next one starts. This can make the original one moot and waste resources just when they are needed most.

What would you think of adding an optional flag that basically says "If the previous update of this dataset is still running, terminate it and start this one"?

There are a few issues that would have to be worked out, such as avoiding a domino effect so that the dataset never updates, but would some version of this feature be helpful?

rjmac commented 9 years ago

That is an entirely plausible feature. In fact, we do a similar thing already for the tech preview sync jobs, which use the same infrastructure as datasync. It wouldn't be a "terminate if running" flag (there's no way to do that), but a "terminate if queued but not yet running" one, which also guarantees progress.

levyj commented 9 years ago

Actually, even better since it avoids the problem of a 20-minute job that runs every 15 minutes and therefore would never be allowed to finish.

-----Original Message----- From: rjmac [notifications@github.com] Received: Monday, 02 Mar 2015, 11:11AM To: socrata/datasync [datasync@noreply.github.com] CC: Levy, Jonathan [Jonathan.Levy@cityofchicago.org] Subject: Re: [datasync] Handing frequent replace jobs (#94)

That is an entirely plausible feature. In fact, we do a similar thing already for the tech preview sync jobs, which use the same infrastructure as datasync. It wouldn't be a "terminate if running" flag (there's no way to do that), but a "terminate if queued but not yet running" one, which also guarantees progress.

Reply to this email directly or view it on GitHubhttps://github.com/socrata/datasync/issues/94#issuecomment-76753606.


This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail (or the person responsible for delivering this document to the intended recipient), you are hereby notified that any dissemination, distribution, printing or copying of this e-mail, and any attachment thereto, is strictly prohibited. If you have received this e-mail in error, please respond to the individual sending the message, and permanently delete the original and any copy of any e-mail and printout thereof.

levyj commented 9 years ago

This may have been implied but this feature should send feedback to the client, probably a failure code with accompanying text. That would be fairly important to us since one of the harmful effects of DataSync backlogs for us is jobs staying open forever until they eventually build up and starve the server of resources. We would need the jobs to know that the DataSync operation ended and they can stop waiting.