Open r-barnes opened 5 years ago
Q: several of those large files are graphics useful for examples and documentation.
How can I leave them in, say, the online readme without requiring readers to download them?
On Tue, Jul 2, 2019, 2:03 PM Richard Barnes notifications@github.com wrote:
The repo contains a number of large files that you likely wanted to ignore
- the largest are listed below. This collectively means that the repo is a 100MB download.
41e6f427c11b 7.7MiB analysis/output_files/ALT_DATA2_OUT/fft/fft_results.gif a8267f9be190 7.9MiB analysis/output_files/results_1/xcor/cross-correlations.txt e271bcab6381 11MiB analysis/output_files/results_1/fft/fft_results.gif 669261e09a05 21MiB analysis/output_data/ALT_DATA1_OUT/xcor/cross-correlations.txt 36cbe3d82cf2 36MiB scripts/core.45511 4ac01836f00a 36MiB scripts/core.53132 9c2bb6f1759f 36MiB scripts/core.171982 a6cecc16b57b 57MiB analysis/output_data/ALT_DATA1_OUT/fft/fft_analysis_animation.gif 6def6506d3f7 66MiB scripts/GENESIS.log
these can be removed using the BFG repo cleaner https://rtyley.github.io/bfg-repo-cleaner/ using the following commands:
git clone --mirror https://github.com/kellykochanski/rescal-snow.git java -jar ~/Downloads/bfg-1.12.13.jar --delete-folders 'output_files' rescal-snow.git java -jar ~/Downloads/bfg-1.12.13.jar --delete-folders 'output_data' rescal-snow.git java -jar ~/Downloads/bfg-1.12.13.jar --delete-files 'core.' rescal-snow.git java -jar ~/Downloads/bfg-1.12.13.jar --delete-files 'GENESIS.log' rescal-snow.git java -jar ~/Downloads/bfg-1.12.13.jar --delete-files '.o' rescal-snow.git java -jar ~/Downloads/bfg-1.12.13.jar --delete-files '*.py~' rescal-snow.git
Perhaps the
scripts/DUN.csp
file is also a temporary? It takes up 10MB.after which you should check to make sure things look alright and then
cd rescal-snow.git git reflog expire --expire=now --all && git gc --prune=now --aggressive
The upside is that this reduces the repo size to either 11MB (with DUN.csp) or (1MB without DUN.csp), which saves bandwidth and space for users.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kellykochanski/rescal-snow/issues/5?email_source=notifications&email_token=AEAG2VQ7JPHN4GNEH6LU6ULP5O7BLA5CNFSM4H47Y4BKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G47OPHQ, or mute the thread https://github.com/notifications/unsubscribe-auth/AEAG2VRRZRN3BK2NXLE25NLP5O7BLANCNFSM4H47Y4BA .
They must be in the repo to appear in the readme, unless you host them elsewhere.
However, none of the files I've suggested purging (I don't think) are currently used by the repo. These are (I think) all large files that were mistakenly committed in the past. Removing from the repo using git rm
doesn't remove them from the history, so the repo only ever grows in size unless you rewrite history.
The files you show on the readme are stored in example_images
and take only 3.2MB. They should be unaffected by the commands I suggest above.
@kellykochanski: I thought we were fixing this prior to JOSS?
I haven't had time to get to it, and don't want to rush into messing with the git history.
Okay. Can we chat about it prior to JOSS acceptance?
On Sat, Sep 21, 2019 at 11:46 AM Kelly Kochanski notifications@github.com wrote:
@rbarnes https://github.com/rbarnes I haven't had time to get to it, and don't want to rush into messing with the git history.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/kellykochanski/rescal-snow/issues/5?email_source=notifications&email_token=AAXZHVDU3RHR46HCVAT4T5DQKZTYDA5CNFSM4H47Y4BKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7IXPZA#issuecomment-533821412, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXZHVHTY2DMUWO6PNBGOTDQKZTYDANCNFSM4H47Y4BA .
@r-barnes I used bfg as you suggested, and the repo is now 14MB (including the removal of DUN.csp - I think some additional docs with figures have been added since you opened this).
Doing this before merging outstanding PRs could make doing so impossible or difficult...
On Thu, 26 Sep 2019, 08:10 Kelly Kochanski, notifications@github.com wrote:
@r-barnes https://github.com/r-barnes I used bfg as you suggested, and the repo is now 14MB (including the removal of DUN.csp - I think some additional docs with figures have been added since you opened this).
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kellykochanski/rescal-snow/issues/5?email_source=notifications&email_token=AAXZHVDOEDFP775Z3SGCXX3QLTGIDA5CNFSM4H47Y4BKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7V5SCQ#issuecomment-535550218, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXZHVHN77VO5MCTRD55HRLQLTGIDANCNFSM4H47Y4BA .
bfg warned me... Any issue with just repeating the bfg calls after accepting the PRs?
I just went through a similar process with another repository, although the issue was more related to pruning & relocating sensitive information prior to open-sourcing a software package. I discovered that GitHub has write protected refs for PRs. This means that you cannot prune data from these by default.
However, I think I have special settings in my git config to fetch these PR refs that most users do not have, so this may not be a real issue (at least not if you're only concerned about repo file size; it certainly is when you're removing sensitive info).
If it turns out that the PR refs keep the repository size bloated, then, the only solutions are either:
1) Contacting GitHub support and asking them to delete the old PR refs (I'm not sure if they can/will do this for you) 2) Deleting and recreating the repository.
Hopefully you won't need to do either and the PR refs won't much this up for you.
@zbeekman: Cool idea! So that cleans the while repo and associated PRs all at once?
On Thu, 26 Sep 2019, 08:36 zbeekman, notifications@github.com wrote:
I just went through a similar process with another repository, although the issue was more related to pruning & relocating sensitive information prior to open-sourcing a software package. I discovered that GitHub has write protected refs for PRs. This means that you cannot prune data from these by default.
However, I think I have special settings in my git config to fetch these PR refs that most users do not have, so this may not be a real issue (at least not if you're only concerned about repo file size; it certainly is when you're removing sensitive info).
If it turns out that the PR refs keep the repository size bloated, then, the only solutions are either:
- Contacting GitHub support and asking them to delete the old PR refs (I'm not sure if they can/will do this for you)
- Deleting and recreating the repository.
Hopefully you won't need to do either and the PR refs won't much this up for you.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kellykochanski/rescal-snow/issues/5?email_source=notifications&email_token=AAXZHVB7VWN4OW6PL664WWLQLTJJNA5CNFSM4H47Y4BKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7WAMWY#issuecomment-535561819, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXZHVCPIT2PHSKVKCGTZYLQLTJJNANCNFSM4H47Y4BA .
@r-barnes
Cool idea! So that cleans the while repo and associated PRs all at once?
Not 100% sure what you're talking about here. If it's my point 2. "Deleteing and recreating the repository" then I need to explain a little bit further:
What I really mean, is:
git show-ref
git reflog
and git gc
commands recommended by BFGgit push --mirror
or whatever BFG recommends to the new repoI would not recommend this, unless the repo size stays large after a normal pass with BFG. Even then, it's much easier to contact GitHub support and ask if they can delete the old protected PR refs.
I had to go through this procedure because I realized that upon open sourcing a repository, you could still access old PR refs which included the sensitive information that cannot be made public. If you do not need to do it, then please don't.
Also, if you haven't run BFG yet to prune history, you may want to do it either before the final submission or not at all; I'm not sure if it will mess with JOSS' machinery, DOI process, etc. and it will certainly affect tagging.
@zbeekman I ran bfg on the repository, though the changes were rejected from the then-open PR on kk/JOSS-fixes. Downloading rescal-snow is now down to 14MB from ~100MB.
I expect to have all open PRs closed at the time of JOSS acceptance, and will re-run bfg then - I can do this after finishing the corrections in your review, and merging the kk/JOSS-fixes branch, but before formal JOSS acceptance.
I hope bfg will work smoothly if all PRs are closed... Let me know if you think that it won't.
@kellykochanski: Yes it should work fine. IMO, you have images and stuff for the tutorials, and 14MB is probably how much space everything you want to keep takes up. But at the end of the day, I wouldn't bother with any steps that are more complicated than what you are doing. If you get complaints about rejected refs when you try to push due to PR refs, you can just delete them locally then try pushing again. (They will persist on the GitHub side, but I suspect this is fine and most people don't fetch them.)
@zbeekman: The issue is that the repo's history contains ~86MB worth of large temporary and output files which we accidentally committed and later removed.
On Thu, Sep 26, 2019 at 11:26 AM zbeekman notifications@github.com wrote:
@kellykochanski https://github.com/kellykochanski: Yes it should work fine. IMO, you have images and stuff for the tutorials, and 14MB is probably how much space everything you want to keep takes up. But at the end of the day, I wouldn't bother with any steps that are more complicated than what you are doing. If you get complaints about rejected refs when you try to push due to PR refs, you can just delete them locally then try pushing again. (They will persist on the GitHub side, but I suspect this is fine and most people don't fetch them.)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kellykochanski/rescal-snow/issues/5?email_source=notifications&email_token=AAXZHVFP7VHEPQI265U3U6LQLT5ETA5CNFSM4H47Y4BKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7WQYKI#issuecomment-535628841, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXZHVETHTUQVOMZKOS6QRLQLT5ETANCNFSM4H47Y4BA .
[Edited for improved clarity 🤞]
@r-barnes: I'll pipe down and let you guys figure out what you want to do. My point was that it sounds like Kelly had success with BFG and got things down to 14MB. Deleting the entire github repository and re-creating it is (hopefully) beyond the scope of what you want/need to accomplish. At any rate, sorry for the confusion and feel free to ignore my previous comments.
If you run into troubles pushing back up to github after running BFG, let me know, it might be the PR refs issue, and I may know the solution. Either way I'd happily take a look.
@zbeekman: No worries, thanks for your help.
On Thu, Sep 26, 2019 at 12:08 PM zbeekman notifications@github.com wrote:
@r-barnes https://github.com/r-barnes: I'll pipe down and let you guys figure out what you want to do. My point was that it sounds like Kelly had success with BFG and got things down to 14MB, and deleting the entire github repository and re-creating it is (hopefully) beyond the scope of what you want/need to accomplish. At any rate, sorry for the confusion and feel free to ignore my previous comments.
If you run into troubles pushing back up to github after running BFG, let me know, it might be the PR refs issue, and I may know the solution. Either way I'd happily take a look.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/kellykochanski/rescal-snow/issues/5?email_source=notifications&email_token=AAXZHVGJIH2XU6SEAX5F36TQLUCCBA5CNFSM4H47Y4BKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7WUTTQ#issuecomment-535644622, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXZHVBDD7OROUVWDDKHJHLQLUCCBANCNFSM4H47Y4BA .
The repo contains a number of large files that you likely wanted to ignore - the largest are listed below. This collectively means that the repo is a 100MB download.
these can be removed using the BFG repo cleaner using the following commands:
after which you should check to make sure things look alright and then
The upside is that this reduces the repo size to either 11MB (with
DUN.csp
) or (1MB withoutDUN.csp
), which saves bandwidth and space for users.