Closed CloCkWeRX closed 7 years ago
@CloCkWeRX could your scraper just store the JSON on S3 or GitHub as part of its run? Here's an example of a scraper where we store images on S3 and don't use SQLite at all.
Yeah, that probably works. It'd be great if morph was able to be told "here's the results" or similar in a structured way; so you get some UI.
On Wed, Jun 7, 2017 at 12:02 PM, Henare Degan notifications@github.com wrote:
@CloCkWeRX https://github.com/clockwerx could your scraper just store the JSON on S3 or GitHub as part of its run? Here's an example of a scraper where we store images on S3 https://morph.io/openaustralia/australian_local_councillors_images and don't use SQLite at all.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openaustralia/morph/issues/1151#issuecomment-306668827, or mute the thread https://github.com/notifications/unsubscribe-auth/AAWUt_Asm0TAIDAiI_rcRFGh7TQaPg1dks5sBgvFgaJpZM4Nv76a .
To keep the project simple I don't think we're going to support publishing other data formats so I'm going to close this as wontfix. I do like this idea though:
It'd be great if morph was able to be told "here's the results" or similar in a structured way; so you get some UI.
and if that can be distilled into an issue then that would be cool to think about and discuss more.
I'm trying to do some spatial transformation with with https://github.com/CloCkWeRX/australian-road-closures-index; and sqlite is unfortunately a bit of a hindrance given the lack of spatial types.
There are extensions to it, but in my use case I am probably better off just writing out a data.geojson for the 'latest' run.
Obviously, this needs to have limitations - loading a huge document, adding a record to the end and re-writing to disk isn't going to be much fun for lots of documents - so limiting to just N records or putting in place a logrotate type strategy could potentially work.
In some ways, this shortcuts the "Scrape, put into SQL form, expose via API" to just "Scrape, expose as static API"