Closed patcon closed 8 years ago
Sounds like a cool project. I'm not completely clear, though, on what you mean by "add contrib code for scraping"? You could (and it looks like you have) write a scraper that creates the dataset .md
files, or you could write one that creates a data.json
file and use @JJediny's plugin to pull it in. Or are you talking about scraping the data itself?
I'm just now having time to look into @JJediny's plugin, and it looks pretty rad, especially if it allows multiple endpoints. Thanks for the pointer. I'll investigate more later
Sounds good. I'll close this issue for now, but feel free to reopen if there's more to discuss.
A Scrapy pipeline could be used to help people scrape organization/dataset and some metadata from city data portals.
Once I get this sorted out for myself a little bit better, happy to contribute it back.
This could also involve creating a custom storage backend that pushes scraped files directly to the github pages site. This could run regularly via heroku scheduler.
Scrapy also has an S3 storage backend, and it could make more sense to use that, but I'd hate to lose all the nifty gatekeeper stuff :)
Ref: https://github.com/CivicTechTO/scrapers-to-data-portal