momentoscope / hextof-processor

Code for preprocessing data from the HEXTOF instrument at FLASH, DESY in Hamburg (DE)
https://hextof-processor.readthedocs.io/en/latest/
GNU General Public License v3.0
7 stars 4 forks source link

large files still in git history? #67

Closed steinnymir closed 3 years ago

steinnymir commented 3 years ago

I noticed the repository, as it is now with the LFS for the data files, still ammounts to nearly a GB. Is this because of the residuals of the commits of the lfs, or do we have that many changes in the code to make up for 616MB in the .git folder?

zain-sohail commented 3 years ago

I didn't migrate the history, maybe that is the problem. It basically was rewriting all the history but that could be necessary then.

zain-sohail commented 3 years ago

I noticed the repository, as it is now with the LFS for the data files, still ammounts to nearly a GB. Is this because of the residuals of the commits of the lfs, or do we have that many changes in the code to make up for 616MB in the .git folder?

Can you check the feature/lfs branch now and see if the data is moved to LFS?

RealPolitiX commented 3 years ago

I cloned a single branch using the following

git clone -b feature/lfs --single-branch https://github.com/momentoscope/hextof-processor.git

The total size of the cloned branch is ~ 335 MB and the h5 files' changes are in /.git/lfs, with the current version of the files in /tutorial/raw. Is this what you have in mind, @zainsohail04? The master branch alone is ~ 945 MB.

RealPolitiX commented 3 years ago

Yeah, I agree with @steinnymir in the main branch the accumulation of unnecessary copies of the h5 in git history is getting too much...these large files shouldn't be put into this repo in the first place @balerion. Methods to solve this: (1) we purge the git history of copies of these large files and adopt the feature/lfs branch implemented by @zainsohail04, or (2) we put them in a separate repo for data under momentoscope, then the git history copies of these h5 files should still be purged.

It's good to act fast because now with so many branches here, the total size of all branches is getting too big!

zain-sohail commented 3 years ago

I cloned a single branch using the following

git clone -b feature/lfs --single-branch https://github.com/momentoscope/hextof-processor.git

The total size of the cloned branch is ~ 335 MB and the h5 files' changes are in /.git/lfs, with the current version of the files in /tutorial/raw. Is this what you have in mind, @zainsohail04? The master branch alone is ~ 945 MB.

Indeed and all the versions in future will just point to the LFS so the large files don't have to be stored repeatedly. If you go to settings of momentoscope and billing and plans, you can see that 0.3 GB is on github LFS. Now we need to rewrite the older commits/branches so they also store their h5 files in LFS. At least this is what I understood so far.

RealPolitiX commented 3 years ago

OK, interesting, @zainsohail04, I didn't know they have the billing option now for the organization. Try looking into this purging method. Do this for the master branch first and see what happens.

zain-sohail commented 3 years ago

We discussed in the meeting and are of opinion that purging the history should be a last resort (as this is irreversible). Rather, we can remove all inactive branches and then migrate the history using Git LFS command. Afterwards, we can branch out the repo from master to dev etc.