nicercode / EnvironmentalComputing

These are the R markdown files used to generate
http://environmentalcomputing.net/
Creative Commons Attribution 4.0 International
16 stars 7 forks source link

Remove Images/ folder and rewrite history #29

Closed fontikar closed 2 years ago

dfalster commented 2 years ago

The main motivation here is to cull some very large files, and massively reduce the size of the repository. I've reviewed, and we have some really big files. Here are the largest files in the history

➜  EnvironmentalComputing git:(master) ✗ git ls-tree -r -t -l --full-name HEAD | sort -n -k 4 | tail -n 20
➜  EnvironmentalComputing git:(master) ✗ git ls-tree -r -t -l --full-name HEAD | sort -n -k 4 | tail -n 20
1255023 Draft tutorial pages/study-site/_index.html
1641970 Draft tutorial pages/study-site/61395_shp/australia/cstauscd_r.dbf
1801924 Draft tutorial pages/NetCDF/NetCDF.html
1988964 content/Images/Murray-28.jpg
2366625 Draft tutorial pages/Draft_glm/glm.html
2520920 Draft tutorial pages/NetCDF/2015_20151206092000-ABOM-L3S_GHRSST-SSTfnd-AVHRR_D-1d_dn-v02.nc
2772927 Draft tutorial pages/study-site/61395_shp/australia/cstauscd_l.dbf
2982785 content/Images/Murray-13.jpg
3140186 content/Images/Murray-26.jpg
5010716 content/Images/site_photo.JPG
5060602 content/Images/group_photo.JPG
5163498 content/Images/DJI_0021.JPG
5529548 content/Images/DJI_0272.JPG
5701888 content/Coding-Skills/asking-code-questions/help-me-help-you.gif
7142632 Ecostats2017/LectureNotes.pdf
7739084 Draft tutorial pages/study-site/61395_shp/australia/cstauscd_l.shp
10187781    Draft tutorial pages/gganimate/MyEBirdData.csv
12487872    Draft tutorial pages/study-site/61395_shp/australia/cstauscd_r.shp
13593905    Draft tutorial pages/Accessing weather data/maxann.asc
13593905    Draft tutorial pages/Accessing weather data/maxann.txt.txt

The largest (last 4 ) are more than 10Mb each. This makes the repo really big and cumbersome to work with. Can I filter them out of the history? This would "Rewrite" the history. The implication is that all the hashes would change. Through the commit messages would remain. You'd need to reclone.

Only makes sense to do this when everyone is kind of finished working on local stuff, as you'll need to reclone the repo.

Can I go ahead?

dfalster commented 2 years ago

Oh, and hold off making more changes until this is done!

dfalster commented 2 years ago

Here's the hit list:

I was going to remove all Drafts, we can bring them back later if we want to keep

dfalster commented 2 years ago

Success! I've reduced the repo size down from 342Mb to 115Mb. This was achieved by removing many large files.

Step 1: detect large files

We can find largest files in the current tree using git

➜ git ls-tree -r -t -l --full-name HEAD | sort -n -k 4 | tail -n 20

1255023 Draft tutorial pages/study-site/_index.html
1641970 Draft tutorial pages/study-site/61395_shp/australia/cstauscd_r.dbf
1801924 Draft tutorial pages/NetCDF/NetCDF.html
...
2982785 content/Images/Murray-13.jpg
3140186 content/Images/Murray-26.jpg
...
5701888 content/Coding-Skills/asking-code-questions/help-me-help-you.gif
7142632 Ecostats2017/LectureNotes.pdf
7739084 Draft tutorial pages/study-site/61395_shp/australia/cstauscd_l.shp
10187781    Draft tutorial pages/gganimate/MyEBirdData.csv
12487872    Draft tutorial pages/study-site/61395_shp/australia/cstauscd_r.shp
13593905    Draft tutorial pages/Accessing weather data/maxann.asc
13593905    Draft tutorial pages/Accessing weather data/maxann.txt.txt

Some files over 13MB! I also checked out some older versions and reran. As files have moved about, this also captures large files in the history.

I also decided to remove the generated html files from the original site, as these were quite large. I found these by checking an old version and running

list.files(".", pattern = ".html", recursive=T, full.names=T)

Step 2 - Remove files

Files are removed using git filter-repo. This program removes specified files from the history. You need to provided the path of the file or folder to remove. Note, this doesn't account for anytimes the file moved. So if that occurred, need to run twice with different paths

To remove a file run this, replacing filename

git filter-repo --invert-paths --path filename

Beware - when you rewrite history, you change all the commit hashes, so you break collaboration. You'll need to force push and everyone reclone.

file list

Below is the list of files and folders purgred. Each was prefaced with the command above

generated html from old site