weecology / DeepForest

Python Package for Airborne RGB machine learning
https://deepforest.readthedocs.io/
MIT License
505 stars 172 forks source link

Why is this repo so large? #194

Closed bw4sz closed 3 years ago

bw4sz commented 3 years ago

The repo has a reasonable .gitignore, we should use bfg https://rtyley.github.io/bfg-repo-cleaner/ to do some forensic work to delete on files.

ethanwhite commented 3 years ago

Here are all of the commits > 10GB:

14dd556d83fc   12MiB tests/data/NEON_D03_OSBS_DP1_405000_3276000_classified_point_cloud.laz
ea6c8cc4eaeb   12MiB Figures/Panel.png
f3243704c749   13MiB DeepForest/example.png
2a289971c7d1   13MiB data/training/annotations.csv
4016fb622e50   14MiB bfg-1.12.15.jar
8f564c4b3e95   14MiB bfg-1.12.16.jar
4dc535beb4cc   15MiB deepforest/data/2019_YELL_2_528000_4978000_image_crop2.png
117a2df6f28e   15MiB DeepForest/example.png
d4784d28631f   19MiB tests/data/NEON_D03_OSBS_DP1_404000_3284000_classified_point_cloud.laz
669aa904b671   21MiB data/NEON_D03_OSBS_DP1_407000_3291000_classified_point_cloud.laz
3fe23342afdf   24MiB www/bird_panel.png
978b6c858a42   25MiB example.png
82c61105adf5   26MiB data/field_data.csv
70ccb9d3f69d   30MiB DataExploration/DataExploration.html
d03570c2c4e8   31MiB docs/figures/output_65_0.png
8733af7c4fe7   31MiB docs/figures/output_32_0.png
c9fdd6c97aa4   36MiB docs/figures/output_32_0.png
143a9a7583fb   36MiB docs/figures/output_65_0.png
5a8a0cc6bbdd   36MiB docs/figures/output_65_0.png
38a17d974dfc   48MiB tests/LAStools.zip
7e8dd60940ef   55MiB data/SJER/2017/2017_SJER_2_259000_4110000_image 2.tif
75e71027fe66   59MiB data/OSBS/2017_OSBS_3_407000_3291000_image.tif
a17af90de811   61MiB data/2017_OSBS_3_400000_3287000_image.tif
6b1469451a53   63MiB data/OSBS/2017_OSBS_3_411000_3282000_image.tif
b7100a7b00ad   66MiB data/SJER/2017/2017_SJER_2_259000_4110000_image.tif
f425a9d21a60   99MiB data/training/detection.csv

So filter branching (or bfging) out the data directory would definitely help since it doesn't exist anymore and has the 6 largest commits totaling ~360 MB (and there are a bunch more 1-10 GB commits from that directory). If we can strip out test/LAStools.zip that's another 50 MB and the bfg jars are another 30 MB. I'm assuming this are all

bw4sz commented 3 years ago

I did some work to delete old files and clean up the tensorflow branch. When we delete master in favor of pytorch I believe this will be solved.

bw4sz commented 3 years ago

closing this for now, I decreased size by 50%.