Closed keeganmcbride closed 1 year ago
Sorry Keegan, I must have missed this question! I feel like this came up before, and I'm trying to remember why we didn't recommend it. I think it's because if a git repo gets large enough, it starts to become incredibly slow (cloning, syncing, assumedly deploying). So we were talking about using separate repos for that. But it sounds like it's working for you?
We have 6 or 7 resources stored here, other than that they are hosted elsewhere, but have not experienced any performance issues. Of course, we are not running on GHpages anymore.
The big data problem is solvable with Git LFS. There's a nice tutorial from GitHub's staff here: https://github.community/t5/Support-Protips/Working-with-large-files-and-repositories/ba-p/9343
An alternative (imo preferred) solution to the big data problem is to split compression into multiple files, which can be dictated by a specified size.
Certainly does not get around the speed issues mentioned previously, and I agree with the other idea about hosting the data separately.
That said, I use the compression trick all the time here. Below is a screenshot of Kika in OS X, set below GitHub's 100mb limit. You can also do this from cmd.
In our Estonian implementation, we have also started to see ministries host their OGD sets on GitHub and then provide this URL, for example: https://github.com/maanteeamet/opendata/tree/master/LIIKLUSREGISTER
Just wondering why it hasn't been done (maybe there is a reason), but would it not make sense to make a /downloads folder, allow for files to be uploaded there, and then simply access via the base url/downloads/filename?
As some of our datasets did not have URLs, this is a strategy we pursued (https://github.com/okestonia/jkan/tree/master/downloads)