Closed dhimmel closed 6 years ago
I had to do some more hacking to get the former commit information to be embedded in the new commit messages:
rm -rf hetmat-bfg*
git clone --mirror git@github.com:greenelab/hetmech.git hetmat-bfg.git
#cp -r hetmat-bfg.git.bak hetmat-bfg.git
# Workaround to get bfg to add Former-commit-id to commit messages
# https://github.com/rtyley/bfg-repo-cleaner/issues/112
java -jar ~/Downloads/bfg-1.13.0.jar \
--convert-to-git-lfs '{*.ipynb}' \
hetmat-bfg.git
# Update commit messages
cd hetmat-bfg.git
cat > msg-filter.py << "EOF1"
# Python utility to modify commit message
import sys
import re
text = sys.stdin.read()
def replace_num(match):
return 'greenelab/hetmech' + match.group(1)
# https://regex101.com/r/CaS5S0/2
text = re.sub(
pattern=r'(?<=[\s(])(#\d+)',
repl=replace_num,
string=text,
)
def replace_hash(match):
# must split commit hash with a rare character, since BFG replaces commit hash on subsequent runs
hash = match.group(1)
return f'Former-commit: https://github.com/greenelab/hetmech/commit/{hash[:2]}⛷{hash[2:]}'
# https://regex101.com/r/U7XKwD/1
text = re.sub(
pattern=r'Former-commit-id: ([0-9a-f]+)',
repl=replace_hash,
string=text,
)
print(text)
EOF1
# https://davidwalsh.name/update-git-commit-messages
git filter-branch --force \
--msg-filter "python `pwd`/msg-filter.py" \
master
# Remove unwanted files
cd ..
java -jar ~/Downloads/bfg-1.13.0.jar \
--no-blob-protection \
--convert-to-git-lfs '{}' \
--delete-folders "{data,explore}" \
--delete-files "{*.ipynb,.gitattributes,*.txt,*.tsv,*.bz2,*.xz,*.gz,*.zip}" \
--private \
hetmat-bfg.git
# Remove breaker character
cd hetmat-bfg.git
git filter-branch --force \
--msg-filter "python -c \"import sys; print(sys.stdin.read().replace('⛷', ''))\"" \
master
# Prune empty commits
git filter-branch --force --commit-filter 'git_commit_non_empty_tree "$@"' HEAD
# Garbage collect
git reflog expire --expire=now --all && git gc --prune=now --aggressive
cd ..
git clone hetmat-bfg.git hetmat-bfg
cd hetmat-bfg
git remote set-url origin git@github.com:hetio/hetmat.git
git push --force
@zietzm and I were thinking about relocating the python package in
greenelab/hetmech
to this repo, while keepinggreenelab/hetmech
for data analyses. This change is motivated by issues such as https://github.com/greenelab/hetmech-backend/pull/5#issuecomment-435172439.Here is the code we used to rewrite history to remove the non-essential files that were taking up too much space: