openaustralia / morph

Take the hassle out of web scraping
https://morph.io
GNU Affero General Public License v3.0
461 stars 74 forks source link

Use results of previous successful run if scraper crashed #1191

Open daald-docker opened 6 years ago

daald-docker commented 6 years ago

In my scraper (https://morph.io/daald-docker/sac_uto_touren?utm_medium=email&utm_source=alerts), at first step I invalidate the existing data before adding/overwriting new. Other scrapers just might remove the whole database before starting.

Doing this, I end up in a incomplete/unusable state if my script crashes at some point.

morph.io should only use the resulting database if the scraper ran through fine. If it didn't (exit code > 0), the result should be discarded and the one of the previous successful run should be used instead.

An exception could be done if no previous successful result exists.

fawkesley commented 6 years ago

Have you considered putting everything inside a database transaction?

So you'd clear the data, then re-add data, and if something failed, the transaction would rollback and the database would be exactly as it was before you started the transaction.