pints-team / pints

Probabilistic Inference on Noisy Time Series
http://pints.readthedocs.io
Other
228 stars 33 forks source link

Clean up git commit history #1164

Closed chonlei closed 4 years ago

chonlei commented 4 years ago

I know it is sometime important to keep a record for the git commits, but at the moment downloading pints can take awhile (as it's now getting up to >100MB!)... and looking at what takes up the size:

$ du -ha . | sort -rh | head -15
131M    .
105M    ./.git/objects/pack/pack-b02e0fddfffb6a1a8ba22e759ab5d14d116d0b07.pack
105M    ./.git/objects/pack
105M    ./.git/objects
105M    ./.git
24M ./examples
7.1M    ./examples/toy
5.8M    ./examples/sampling
4.0M    ./examples/optimisation
3.8M    ./examples/plotting
3.4M    ./examples/stats
1.8M    ./examples/optimisation/cmaes-bare.ipynb
1.5M    ./pints
1.5M    ./examples/stats/autoregressive-moving-average-errors.ipynb
1.4M    ./examples/plotting/optimisation-2d-surface.ipynb

It's (almost) nothing from the actual code in pints! It's all from the .git (100 MB!!) then the example notebooks (24 MB!) and finally the actual code in pints (less than 2MB!)...

I suspect the git is storing each tiny changes for the notebook, but it's not easy for git to track the changes so it's been saving a lot of redundant info here and there...

So I think we should reduce the size (now) and also find a way to keep it as we keep developing pints...

MichaelClerx commented 4 years ago

Nope! The repo's for developers. Once we do a proper release users can just pip install pints without ever touching the repo

chonlei commented 4 years ago

OK, that's a fair point!

But still, do we really want to keep tracking all the example changes?! Some were simply the output/log of MCMC runs, which I don't think it's very useful for the developers?? I thought keeping track the code in the example makes sense, but not as much for the output (all those figures and logs etc.) in each and every re-run... (because even one line change, say in the final figure label, and re-run, given the randomness, the commit has to store almost a new jupyter notebook!!)

MichaelClerx commented 4 years ago

We could certainly think about that. I think there are some funky git things you can do that people use to e.g. avoid storing lots of copies of PDFs. However, these tricks all make it harder for devs to use and set up the repo, so I'd only suggest we look into those once downloading the pints repo is a bigger task than, say, downloading a season of Game of Thrones

MichaelClerx commented 4 years ago

It would also seriously mess with functional testing, and our idea of going back in time and finding/fixing bugs. So I'm closing this, if that's ok @chonlei !