Closed larsoner closed 1 year ago
could you have a look at simplifying the history? There are 2 points which could help to clean-up this repo:
Yes but to be safe the way I'd approach the problem is to create a new branch clean-main
with a different/blob-less history. Then you can examine the blame on GitHub easily, do a diff
between it and main
(and see there are no changes), etc. I started working on this and saw the following when I looked for large files in history:
git fetch uptsream
$ git checkout main
$ git clean -xdf
$ git reset --hard upstream/main
$ git checkout -b clean-main
$ git rev-list --objects --all |
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
sed -n 's/^blob //p' |
sort --numeric-sort --key=2 |
cut -c 1-12,41- |
$(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
...
d1b37407f3ed 15MiB dev/.doctrees/environment.pickle
ed30123a0406 16MiB dev/_sources/generated/tutorials/20_stream_receiver_filtered_buffer.rst.txt
13ddbac8308b 16MiB dev/generated/tutorials/20_stream_receiver_filtered_buffer.html
2296feabf4e1 21MiB doc/_static/stream_viewer/stream_viewer.gif
1c03d1004ea5 30MiB dev/.doctrees/generated/tutorials/20_stream_receiver_filtered_buffer.doctree
8a5ab8fad80d 38MiB doc/_static/stream_viewer/stream_viewer.mov
af623416d969 40MiB datasets/sample/sample-ant-raw.fif
e3964ca961e4 40MiB datasets/sample/sample-ant-raw.fif
37c3da0c0853 45MiB dev/.doctrees/environment.pickle
Clearly we can get rid of the environment.pickle
s and the other stuff you mention above, but it would be good to set up some testing
infrastructure to get rid of sample-ant-raw.fif
as well.
Can we just use MNE-Python's mne-testing-data
? It's overkill but sharing the infrastructure for determining version, downloading it, and caching it with GH actions means lower maintenance burden at the mne-lsl
end because if stuff breaks the fix is almost always already in MNE-Python itself.
Thanks, and thanks for sharing the git
commands you are using!
For the datasets, that's already done in #151. Using MNE-Python
infrastructure was a bit too overkill. Instead, I used a very lightweight version with a 3-line function to generate a registry (checksum) for the files to download, and another 3-line function to download the files with pooch
.
.. downloaded from https://github.com/mscheltienne/mne-lsl-datasets.
... also I'd recommend pushing main
as-is before we overwrite it with something like git push origin main:main-bak
or whatever you want to call it
I'd recommend pushing main as-is before we overwrite it with something like git push origin main:main-bak
So just making a backup main branch? Done, main-backup
.
Following https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository it looked like https://rtyley.github.io/bfg-repo-cleaner/ would make this trivial but it needed java so instead I just went the manual way by passing the following paths one by one (iterating with the above file-size command) to git filter-repo --invert-paths --force --path
:
With this the biggest files are now:
6a62a71fe603 56KiB bsl/externals/pylsl/pylsl.py
bb59ec379891 61KiB doc/_static/logging/flowchart-dark.png
c513a72db843 64KiB doc/_static/install/Advanced_system_settings.png
00ef7182b433 120KiB doc/CNBI Arduino Trigger.pdf
38963b805281 136KiB neurodecode/layout/biosemi_128ch.jpg
cad0e5c1a262 145KiB neurodecode/layout/biosemi_064ch.jpg
5e57d4f11eee 181KiB doc/_static/cli/stream_viewer_backend.png
a662e02c5080 188KiB doc/_static/cli/stream_recorder.png
a17d4ab0950a 303KiB doc/_static/cli/stream_player.png
f29b203a3865 594KiB doc/_static/icon/icon.pdf
0067886ece76 600KiB doc/_static/icon/icon.ai
10d18cc67040 694KiB doc/_static/icon/icon.png
450702b383d1 2.1MiB doc/_static/icon/icon.jpg
7e704112c604 4.8MiB doc/_static/icon/icon.eps
6e067c1c7566 6.3MiB doc/_static/icon/icon.psd
Can you look at the file list above and the branch here to see if it makes sense? If it does I can locally do git push upstream --force clean-main:main
🤞
That looks great, thank you!
And thanks for the detailed instructions since I'll be doing it again for the icons
when I change them on main
(and probably a couple of remaining old files in that list).
Okay, force-pushed clean-main:main
@larsoner Could you have a second look to confirm that it seems in order? I'm struggling a bit.. I removed a couple more files after I changed the icons/logos.
git rev-list --objects --all |
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
sed -n 's/^blob //p' |
sort --numeric-sort --key=2 |
cut -c 1-12,41- |
$(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
The command above is giving me 2 different outputs when ran in 2 different terminals, in the same repository.. and I am not figuring out what is going on :disappointed:
Also, the main
branch does clone faster; but it seems like the gh-pages
branch still holds the blobs in its history and is still taking a long time to retrieve. IIRC, it was created as a new branch from main and not as an empty new branch.
I should be able to look tomorrow!
Thanks, no hurry :)
I only see one commit in https://github.com/mne-tools/mne-lsl/tree/gh-pages so I think that's good. I think that the rev-list
stuff should ideally be run on a clean clone
because it processes all branches at once, and I didn't do that the first time. I just redid it, can you check https://github.com/larsoner/mne-lsl/tree/clean-main and see if it's okay and I'll force-push it to mne-tools:main
?
Seems OK, tests are passing. Should be good to go.
I think that the rev-list stuff should ideally be run on a clean clone because it processes all branches at once
I was a bit lost with how it was "sometimes" processing all branches and sometimes not. Thanks for having a second look!
Okay, force-pushed to main
Thanks! But I'm still lost.. I re-forked the repository to get to a clean state, cloned it (which still required about 110 MiB), and ran again:
git rev-list --objects --all |
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
sed -n 's/^blob //p' |
sort --numeric-sort --key=2 |
cut -c 1-12,41- |
$(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
I'm still getting:
a103c0537dd8 4.0MiB doc/_static/stream_recorder/stream_recorder_cli.mov
4c2c92ee6e50 4.0MiB doc/_static/icon-with-name/icon-with-name.eps
a94028888490 4.2MiB doc/_static/stream_player/stream_player_cli.mov
47b68132f888 4.4MiB doc/_static/icon-with-acronym/icon-with-acronym.eps
80b3d1b29247 4.4MiB doc/_static/icon-with-acronym/icon-with-acronym.psd
7e704112c604 4.8MiB doc/_static/icon/bsl-icon.eps
13fd71d6a874 4.8MiB doc/_static/icon-with-name/icon-with-name.psd
364be804568a 5.1MiB libLSL/liblsl32-debug.dll
6e067c1c7566 6.3MiB doc/_static/icon/bsl-icon.psd
443f16464fe2 7.0MiB libLSL/liblsl64-debug.dll
10884859b78b 7.4MiB examples/sample/mi_left_right-raw.fif
a09427b159b4 7.4MiB sample/mi_left_right.fif
a68291046bc1 7.4MiB sample/mi_left_right.fif
2adf8579ef11 7.4MiB sample/mi_left_right-raw.fif
32bba913b292 11MiB dev/.doctrees/environment.pickle
c5bc58c3b2f5 13MiB Protocols/cv2.pyd
2296feabf4e1 21MiB doc/_static/stream_viewer/stream_viewer.gif
8a5ab8fad80d 38MiB doc/_static/stream_viewer/stream_viewer.mov
And many other files above 1 MiB, despite your 2 passes and despite my own with git filter-repo
and with bfg-repo-cleaner
. Do you see the same on your side?
From https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository#purging-a-file-from-your-repositorys-history it's probably this:
In order to remove the sensitive file from your tagged releases, you'll also need to force-push against your Git tags
I'll try again!
Okay I went back through with the removals then did:
larsoner@bunk:~/python/mne-lsl$ git remote add origin git@github.com:/larsoner/mne-lsl.git
larsoner@bunk:~/python/mne-lsl$ git push origin --force --all
Enumerating objects: 10970, done.
...
To github.com:/larsoner/mne-lsl.git
+ d63012c...aebd24c main -> main (forced update)
* [new branch] gh-pages -> gh-pages
larsoner@bunk:~/python/mne-lsl$ git push origin --force --tags
Enumerating objects: 10500, done.
...
* [new tag] 0.6.4 -> 0.6.4
larsoner@bunk:~/python/mne-lsl$ cd ..
larsoner@bunk:~/python$ rm -Rf mne-lsl/
larsoner@bunk:~/python$ git clone git@github.com:/larsoner/mne-lsl.git
Cloning into 'mne-lsl'...
...
Receiving objects: 100% (14850/14850), 4.77 MiB | 24.32 MiB/s, done.
Resolving deltas: 100% (10544/10544), done.
larsoner@bunk:~/python$ cd mne-lsl
larsoner@bunk:~/python/mne-lsl$ du -hs
6.5M .
So I think it worked. I then went to my gh-pages branch and discarded the 1 commit so that it matched the one here. The size is now ~8MB. Also note:
96a1a32a7959 11MiB dev/.doctrees/environment.pickle
so you could save a tiny bit of git clone
by pruning this Sphinx env pickle file when you deploy to gh-pages.
Can you make sure you're happy with the state of larsoner:mne-lsl
and if so I'll force push main
and tags here? Then things should work :crossed_fingers: :crossed_fingers:
Looks good, diff is empty and tests are passing. And it did clone your fork way faster! 🤞 I'll remove that environment file!
Thanks for looking (again) (again) into this!
Hah!
$ git remote remove origin # was larsoner
$ git remote add origin git@github.com:/mne-tools/mne-lsl.git
$ git push origin --force --all
Everything up-to-date
$ git push origin --force --tags
Enumerating objects: 10500, done.
...
+ 194c9a2...3588455 0.6.4 -> 0.6.4 (forced update)
The "Everything up-to-date" when I force-pushed the branches suggests it was just the tags that needed to be updated, which is cool. Mystery solved!
Yes that looks good! That was too sneaky for me.. 😅
I got the
gh-pages
branch covered, it's not back to 1 commit, and I removed the large files from the last commit of the main branch. I'm not very confident in editing the git history (I'm rarely using git CLI..), could you have a look at simplifying the history? There are 2 points which could help to clean-up this repo:mne_lsl/lsl/lib/*
andbsl/lsl/lib/*
doc/_static/icon-with-name/*
,doc/_static/icon-with-acronym/*
,doc/_static/stream_player/*
,doc/_static/stream_recorder
,doc/_static/stream_viewer
I will also remove the
0.x.x
release which will go back to ghfcbg-hnp-meeg/bsl
.