hetio / hetmatpy

Python package for matrix storage and operations on hetnets
Other
14 stars 9 forks source link

Relocate package from greenelab/hetmech to this repo #1

Closed dhimmel closed 5 years ago

dhimmel commented 5 years ago

@zietzm and I were thinking about relocating the python package in greenelab/hetmech to this repo, while keeping greenelab/hetmech for data analyses. This change is motivated by issues such as https://github.com/greenelab/hetmech-backend/pull/5#issuecomment-435172439.

Here is the code we used to rewrite history to remove the non-essential files that were taking up too much space:

rm -rf hetmat-bfg*

git clone --mirror git@github.com:greenelab/hetmech.git hetmat-bfg.git

java -jar ~/Downloads/bfg-1.13.0.jar \
  --no-blob-protection \
  --delete-folders "{data,explore}" \
  --delete-files "{*.ipynb,.gitattributes,*.txt,*.tsv,*.bz2,*.xz,*.gz,*.zip}" \
  hetmat-bfg.git

cd hetmat-bfg.git
# Prune empty commits
git filter-branch --commit-filter 'git_commit_non_empty_tree "$@"' HEAD
git reflog expire --expire=now --all && git gc --prune=now --aggressive
cd ..

git clone hetmat-bfg.git hetmat-bfg
cd hetmat-bfg

cat > msg-filter.py << "EOF1"
# Python utility to modify commit message
import sys
import re

text = sys.stdin.read()
pattern = re.compile(r'(?<=[\s(])(#\d+)')

def replacer(match):
    # https://regex101.com/r/CaS5S0/2
    return 'greenelab/hetmech' + match.group(1)

text = pattern.sub(repl=replacer, string=text)
print(text)
EOF1

# https://davidwalsh.name/update-git-commit-messages
git filter-branch --force \
  --msg-filter "python `pwd`/msg-filter.py" \
  master

git remote set-url origin git@github.com:hetio/hetmat.git
git push --force
dhimmel commented 5 years ago

I had to do some more hacking to get the former commit information to be embedded in the new commit messages:

rm -rf hetmat-bfg*
git clone --mirror git@github.com:greenelab/hetmech.git hetmat-bfg.git
#cp -r hetmat-bfg.git.bak hetmat-bfg.git

# Workaround to get bfg to add Former-commit-id to commit messages
# https://github.com/rtyley/bfg-repo-cleaner/issues/112
java -jar ~/Downloads/bfg-1.13.0.jar \
  --convert-to-git-lfs '{*.ipynb}' \
  hetmat-bfg.git

# Update commit messages
cd hetmat-bfg.git
cat > msg-filter.py << "EOF1"
# Python utility to modify commit message
import sys
import re

text = sys.stdin.read()

def replace_num(match):
    return 'greenelab/hetmech' + match.group(1)

# https://regex101.com/r/CaS5S0/2
text = re.sub(
    pattern=r'(?<=[\s(])(#\d+)',
    repl=replace_num,
    string=text,
)

def replace_hash(match):
    # must split commit hash with a rare character, since BFG replaces commit hash on subsequent runs
    hash = match.group(1)
    return f'Former-commit: https://github.com/greenelab/hetmech/commit/{hash[:2]}⛷{hash[2:]}'

# https://regex101.com/r/U7XKwD/1
text = re.sub(
    pattern=r'Former-commit-id: ([0-9a-f]+)',
    repl=replace_hash,
    string=text,
)

print(text)
EOF1
# https://davidwalsh.name/update-git-commit-messages
git filter-branch --force \
  --msg-filter "python `pwd`/msg-filter.py" \
  master

# Remove unwanted files
cd ..
java -jar ~/Downloads/bfg-1.13.0.jar \
  --no-blob-protection \
  --convert-to-git-lfs '{}' \
  --delete-folders "{data,explore}" \
  --delete-files "{*.ipynb,.gitattributes,*.txt,*.tsv,*.bz2,*.xz,*.gz,*.zip}" \
  --private \
  hetmat-bfg.git

# Remove breaker character
cd hetmat-bfg.git
git filter-branch --force \
  --msg-filter "python -c \"import sys; print(sys.stdin.read().replace('⛷', ''))\"" \
  master

# Prune empty commits
git filter-branch --force --commit-filter 'git_commit_non_empty_tree "$@"' HEAD

# Garbage collect
git reflog expire --expire=now --all && git gc --prune=now --aggressive
cd ..

git clone hetmat-bfg.git hetmat-bfg
cd hetmat-bfg

git remote set-url origin git@github.com:hetio/hetmat.git
git push --force