Hosting datasets - Githubissues

timwis / jkan

A lightweight, backend-free open data portal, powered by Jekyll

https://jkan.io

MIT License

219 stars 309 forks source link

Hosting datasets #150

Closed keeganmcbride closed 1 year ago

keeganmcbride commented 6 years ago

Just wondering why it hasn't been done (maybe there is a reason), but would it not make sense to make a /downloads folder, allow for files to be uploaded there, and then simply access via the base url/downloads/filename?

As some of our datasets did not have URLs, this is a strategy we pursued (https://github.com/okestonia/jkan/tree/master/downloads)

timwis commented 5 years ago

Sorry Keegan, I must have missed this question! I feel like this came up before, and I'm trying to remember why we didn't recommend it. I think it's because if a git repo gets large enough, it starts to become incredibly slow (cloning, syncing, assumedly deploying). So we were talking about using separate repos for that. But it sounds like it's working for you?

keeganmcbride commented 5 years ago

We have 6 or 7 resources stored here, other than that they are hosted elsewhere, but have not experienced any performance issues. Of course, we are not running on GHpages anymore.

loleg commented 5 years ago

The big data problem is solvable with Git LFS. There's a nice tutorial from GitHub's staff here: https://github.community/t5/Support-Protips/Working-with-large-files-and-repositories/ba-p/9343

jalbertbowden commented 5 years ago

An alternative (imo preferred) solution to the big data problem is to split compression into multiple files, which can be dictated by a specified size.
Certainly does not get around the speed issues mentioned previously, and I agree with the other idea about hosting the data separately. That said, I use the compression trick all the time here. Below is a screenshot of Kika in OS X, set below GitHub's 100mb limit. You can also do this from cmd.
Screen Shot of Kika Archiver in OS X with a Per Compressed File Size Limitation Setting of 80mb

keeganmcbride commented 5 years ago

In our Estonian implementation, we have also started to see ministries host their OGD sets on GitHub and then provide this URL, for example: https://github.com/maanteeamet/opendata/tree/master/LIIKLUSREGISTER