intel-analytics / analytics-zoo

Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
https://analytics-zoo.readthedocs.io/
Apache License 2.0
11 stars 3 forks source link

Reference steps to migrate zoo repo to bigdl-2.0 #88

Open shanyu-sys opened 2 years ago

shanyu-sys commented 2 years ago

Below steps work for me to migrate directories with git history.

Step 0: install git filter-repo

git filter-repo(reference) is much faster (seconds) than the git-history script. Note that git filter-repo requires git >= 2.22.0. You could refer to here to upgrade git.

To install git filter-repo

cd PATH_FOR_FILTER_REPO
git clone https://github.com/newren/git-filter-repo.git
export PATH=$PATH:PATH_FOR_FILTER_REPO  # you could also add that to ~/.bashrc

Step 1: clone zoo

mkdir source-repo
cd source-repo
git clone https://github.com/intel-analytics/analytics-zoo.git
cd analytics-zoo

Step 2: filter directories and rename.

Filter out the paths to migrate. After this, the repo will only contain the selected paths and corresponding git history.

git filter-repo --path pyzoo/zoo/ray --path pyzoo/test/zoo/ray/ --path pyzoo/dev/run-pytests-ray --force

Rename zoo path to bigdl-2.0 pattern.

git filter-repo --path-rename pyzoo/zoo/ray/:python/orca/src/bigdl/orca/ray/ --path-rename pyzoo/test/zoo/ray/:python/orca/test/bigdl/orca/ray/ --path-rename pyzoo/dev/:python/orca/test/dev/

Step 3: link migrated branch with bigdl-2.0

cd analytics-zoo # the destination repo
git remote add source-rep path_to_source-repo/analytics-zoo
git fetch source-rep
git branch migrate-ray remotes/source-rep/master
git checkout migrate-ray
git rebase bigdl-2.0
git remote remove source-rep

# hotfix rebase changing commiter name and time stamp.
git checkout bigdl-2.0 
git log | head -1 # remember the commit id as basecommit, and replace in the last command
git checkout migrate-ray
git filter-branch -f --commit-filter 'export GIT_COMMITTER_NAME="$GIT_AUTHOR_NAME"; export GIT_COMMITTER_EMAIL="$GIT_AUTHOR_EMAIL"; export GIT_COMMITTER_DATE="$GIT_AUTHOR_DATE"; git commit-tree "$@"' -- basecommit..HEAD

Step 4: Change imports and licenses from zoo to bigdl

# avoid changing the .git in the root directory. If you are not on the root directory, ignore this and go ahead.
cd analytics-zoo/python

# change imports 
grep -rl 'zoo\.orca' ./ | xargs sed -i 's/zoo\.orca/bigdl\.orca/g'
# import of ZooTestCase
grep -rl 'test\.zoo\.pipeline\.utils\.test_utils' . | xargs sed -i 's/test\.zoo\.pipeline\.utils\.test_utils/bigdl\.orca\.test_zoo_utils/g'
# import of init_nn_context, init_spark_on_local ...
grep -rl 'zoo import' . | xargs sed -i 's/zoo import/bigdl\.dllib\.utils\.nncontext import/g'

# change license 
grep -rl '2018\ Analytics\ Zoo' . |xargs sed -i 's/2018\ Analytics\ Zoo/2016\ The\ BigDL/g'
hkvision commented 2 years ago

Tips: If you get stuck when sudo add-apt-repository ppa:git-core/ppa for upgrading git, export http_proxy & https_proxy, and add Defaults env_keep="https_proxy" to then end of /etc/sudoers file. https://askubuntu.com/questions/212132/i-cant-add-ppa-repository-behind-the-proxy

hkvision commented 2 years ago
# change license 
grep -rl '2018\ Analytics\ Zoo' . |xargs sed -i 's/2018\ Analytics\ Zoo/2016\ The\ BigDL/g'

Don't know why after using the above command to modify the files and add the changes, git got corrupted...

error: inflate: data stream error (incorrect header check)
fatal: packed object 2ff5e776b04dbfa74a179a928373b16489ed9009 (stored in .git/objects/pack/pack-31d0f798889e780db38b77226344f9fba369ec8a.pack) is corrupt
shanyu-sys commented 2 years ago
# change license 
grep -rl '2018\ Analytics\ Zoo' . |xargs sed -i 's/2018\ Analytics\ Zoo/2016\ The\ BigDL/g'

Don't know why after using the above command to modify the files and add the changes, git got corrupted...

error: inflate: data stream error (incorrect header check)
fatal: packed object 2ff5e776b04dbfa74a179a928373b16489ed9009 (stored in .git/objects/pack/pack-31d0f798889e780db38b77226344f9fba369ec8a.pack) is corrupt

You might run the grep command in root directory of analytics-zoo? If so, it could have changed .git directory.