openaustralia / morph

Take the hassle out of web scraping
https://morph.io
GNU Affero General Public License v3.0
461 stars 74 forks source link

Support publishing as data.json / data.geojson #1151

Closed CloCkWeRX closed 7 years ago

CloCkWeRX commented 7 years ago

I'm trying to do some spatial transformation with with https://github.com/CloCkWeRX/australian-road-closures-index; and sqlite is unfortunately a bit of a hindrance given the lack of spatial types.

There are extensions to it, but in my use case I am probably better off just writing out a data.geojson for the 'latest' run.

Obviously, this needs to have limitations - loading a huge document, adding a record to the end and re-writing to disk isn't going to be much fun for lots of documents - so limiting to just N records or putting in place a logrotate type strategy could potentially work.

In some ways, this shortcuts the "Scrape, put into SQL form, expose via API" to just "Scrape, expose as static API"

henare commented 7 years ago

@CloCkWeRX could your scraper just store the JSON on S3 or GitHub as part of its run? Here's an example of a scraper where we store images on S3 and don't use SQLite at all.

CloCkWeRX commented 7 years ago

Yeah, that probably works. It'd be great if morph was able to be told "here's the results" or similar in a structured way; so you get some UI.

On Wed, Jun 7, 2017 at 12:02 PM, Henare Degan notifications@github.com wrote:

@CloCkWeRX https://github.com/clockwerx could your scraper just store the JSON on S3 or GitHub as part of its run? Here's an example of a scraper where we store images on S3 https://morph.io/openaustralia/australian_local_councillors_images and don't use SQLite at all.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openaustralia/morph/issues/1151#issuecomment-306668827, or mute the thread https://github.com/notifications/unsubscribe-auth/AAWUt_Asm0TAIDAiI_rcRFGh7TQaPg1dks5sBgvFgaJpZM4Nv76a .

henare commented 7 years ago

To keep the project simple I don't think we're going to support publishing other data formats so I'm going to close this as wontfix. I do like this idea though:

It'd be great if morph was able to be told "here's the results" or similar in a structured way; so you get some UI.

and if that can be distilled into an issue then that would be cool to think about and discuss more.