hetio / hetionet

Hetionet: an integrative network of disease
https://neo4j.het.io
257 stars 68 forks source link

Rewriting repo history to use Git LFS #12

Closed dhimmel closed 5 years ago

dhimmel commented 5 years ago

In https://github.com/hetio/hetionet/commit/23f6117c24b9a3130d8050ee4354b0ccd6cd5b9a, we began using LFS to store large files. While this commit uses LFS, the history still contains non-LFS files. Therefore, we can use the BFG Repo Cleaner to create a history where all files use LFS. We will keep the pre-LFS history available in branches, but not master.

dhimmel commented 5 years ago

I ran the following commands:

# https://rtyley.github.io/bfg-repo-cleaner/
# https://confluence.atlassian.com/bitbucket/use-bfg-to-migrate-a-repo-to-git-lfs-834233484.html

# Outside of original repository
git clone --mirror git@github.com:hetio/hetionet.git hetionet-bfg.git
java -jar ~/Downloads/bfg-1.13.0.jar \
  --convert-to-git-lfs "*.{bz2,gz,xz,zip}" \
  hetionet-bfg.git
cd hetionet-bfg.git
git reflog expire --expire=now --all && git gc --prune=now --aggressive
git lfs fetch --all
git lfs push --all origin master

# From within original repository
git checkout -b master-bfg
git remote add bfg /home/dhimmel/Desktop/hetionet-bfg.git
git fetch bfg
git reset --hard bfg/master
git push --set-upstream origin master-bfg

The bfg command produced the following output:

Using repo : /home/dhimmel/Desktop/hetionet-bfg.git

Found 44 objects to protect
Found 9 commit-pointing refs : HEAD, refs/heads/master, refs/heads/matrix, ...

Protected commits
-----------------

These are your protected commits, and so their contents will NOT be altered:

 * commit 23f6117c (protected by 'HEAD')

Cleaning
--------

Found 70 commits
Cleaning commits:       100% (70/70)
Cleaning commits completed in 3,231 ms.

Updating 8 Refs
---------------

    Ref                    Before     After   
    ------------------------------------------
    refs/heads/master    | 23f6117c | 3cf25f2c
    refs/heads/matrix    | 3a09715e | f7190594
    refs/heads/neo4j-2.3 | 7eec671b | ec10b0a4
    refs/heads/neo4j-3.0 | 7d3d257c | 95818820
    refs/heads/pre-lfs   | 23f6117c | 3cf25f2c
    refs/pull/11/head    | 3a09715e | f7190594
    refs/pull/11/merge   | caaedf26 | 5af06766
    refs/tags/v1.0.0     | 4933ca17 | 3bbc130a

Updating references:    100% (8/8)
...Ref update completed in 33 ms.

Commit Tree-Dirt History
------------------------

    Earliest                                              Latest
    |                                                          |
    DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDmmmm

    D = dirty commits (file tree fixed)
    m = modified commits (commit message or parents changed)
    . = clean commits (no changes to file tree)

                            Before     After   
    -------------------------------------------
    First modified commit | 0009918a | 38873689
    Last dirty commit     | 9f214ab7 | ff4f2eef

Changed files
-------------

    Filename                          Before & After                          
    --------------------------------------------------------------------------
    hetionet-v1.0-edges.sif.gz      | 81f59db4 ⇒ 90dc2b88                     
    hetionet-v1.0-perm-1.db.tar.bz2 | f55f7c4d ⇒ 3459eedb                     
    hetionet-v1.0-perm-1.json.bz2   | 735791b0 ⇒ 430b6f3e                     
    hetionet-v1.0-perm-2.db.tar.bz2 | a49c15dc ⇒ 76303e92                     
    hetionet-v1.0-perm-2.json.bz2   | d92bd3d6 ⇒ 928c919d                     
    hetionet-v1.0-perm-3.db.tar.bz2 | d27169c3 ⇒ ae4716bb                     
    hetionet-v1.0-perm-3.json.bz2   | 5ecba96e ⇒ 1e420742                     
    hetionet-v1.0-perm-4.db.tar.bz2 | 3bbf56d4 ⇒ a832f8b9                     
    hetionet-v1.0-perm-4.json.bz2   | 2b5ca7b0 ⇒ cbeb6364                     
    hetionet-v1.0-perm-5.db.tar.bz2 | f0df2d09 ⇒ 381b0472                     
    hetionet-v1.0-perm-5.json.bz2   | c51dcd1e ⇒ d18d6af3                     
    hetionet-v1.0.db.tar.bz2        | 152c7796 ⇒ 76141ba3, 36ae082b ⇒ df19a5fc
    hetionet-v1.0.json.bz2          | 54177a6a ⇒ ce8ba918                     

In total, 267 object ids were changed. Full details are logged here:

    /home/dhimmel/Desktop/hetionet-bfg.git.bfg-report/2018-11-07/11-21-52

BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive

hetionet-bfg.git.bfg-report/2018-11-07/11-21-52 contains the three text files:

When running git push --set-upstream origin master-bfg

I kept getting

open /home/dhimmel/Documents/serg/rephetio/hetionet/hetnet/permuted/neo4j/hetionet-v1.0-perm-1.db.tar.bz2: no such file or directory
error: failed to push some refs to 'git@github.com:hetio/hetionet.git'

I added --no-verify, which made the upload work. Hopefully, this isn't too dangerous!.

So this repo now has a master-bfg branch, which I will switch to master.

dhimmel commented 5 years ago

The master branch prior to the BFG rewrite is available at https://github.com/hetio/hetionet/tree/pre-lfs

dhimmel commented 5 years ago

I have decided to undo this, but keep the BFG rewrite around in a bfg-lfs-rewrite branch. I decided it was too risky for too little benefit. The repo size is not too big at the moment. Specifically, I was having issues migrating #11 to be based on the rewritten master. I tried cherry-picking, checking out individual files, and rebasing... none which worked. I also got errors during the rebase:

Encountered 5 file(s) that should have been pointers, but weren't:
    hetnet/permuted/neo4j/hetionet-v1.0-perm-1.db.tar.bz2
    hetnet/permuted/neo4j/hetionet-v1.0-perm-2.db.tar.bz2
    hetnet/permuted/neo4j/hetionet-v1.0-perm-3.db.tar.bz2
    hetnet/permuted/neo4j/hetionet-v1.0-perm-4.db.tar.bz2
    hetnet/permuted/neo4j/hetionet-v1.0-perm-5.db.tar.bz2

So I'm going back!