holoviz-topics / EarthML

Tools for working with machine learning in earth science
https://earthml.holoviz.org
BSD 3-Clause "New" or "Revised" License
94 stars 21 forks source link

Experimenting with using git lfs rather than evaluated branch #84

Closed jsignell closed 5 years ago

jsignell commented 5 years ago

Having been unhappy with using a separate branch to keep track of evaluated notebooks, I was thinking it would be good to explore git-lfs instead. I think it could work really nicely - I just need to make sure that the right files are being git-lfsed and the others are being ignored. In the case of this repo, only the evaluated topics should be checked in at all. But what is the right way to indicate that. Should we gitignore the evaluated tutorial notebooks? And if so how can we expose that to the package maintainer. Ideally I think the .gitattributes should be the same across all projects, and the .gitignore should be the one to change.

jsignell commented 5 years ago

Note: here is info on how to install git-lfs https://git-lfs.github.com/

jlstevens commented 5 years ago

It is good to experiment with alternative approaches. I'm not a huge fan of branches for evaluated notebooks either.

I haven't reviewed the PR in any detail yet but I do have an immediate concern about using git-lfs...at least with GitHub. Here is their current policy afaik:

image

If we have large notebooks (say 50 MB each) then 1GB gets you 20 evaluated notebooks, which should be enough. The bandwidth also being 1GB is more worrying - it means that if you use up that gigabyte you can only do one doc build on Travis a month!

Is there a way around this? Or is the idea to either to pay for the GitHub non-free tier or use git-lfs somewhere else?

philippjfr commented 5 years ago

1 GB of bandwidth definitely doesn't seem like it'll be sufficient. How big are all the evaluated notebooks at the moment?

jsignell commented 5 years ago

I hadn't thought about those things. I think the bandwidth constraint definitely won't work. They are up around 100 MB.

jbednar commented 5 years ago

I don't like branches for this purpose either, but as the point of moving to git lfs is to handle larger files, but the restrictions on lfs are stricter than on a regular branch, there doesn't seem to be much reason to do it. The alternative that could work is to store the files in a separate site like S3 then use git lfs to manage the version tracking of those files, but that's not an option that most people could use by default, because S3 is always paid...