usgs / groundmotion-processing

Parsing and processing ground motion data
Other
54 stars 41 forks source link

Should we clean out large/old data files? #1038

Closed emthompson-usgs closed 1 year ago

emthompson-usgs commented 1 year ago

I realize this goes against the idea of git preserving the entire repo history, but given that we recently cleaned up a bunch of large unused/redundant test data, should we clear some of those large files that out of the git history? I only just learned about BFG Repo Cleaner and I have not used it before. But it seems like it might be worth doing.

ghost commented 1 year ago

I'm guessing this would require all of users to reclone the repository again after cleaing up the history?

An alternative option to BFG Repo Cleaner, is https://github.com/newren/git-filter-repo which is more actively updated and offers a comparison of these types of tools.

baagaard-usgs commented 1 year ago

If the releases are preserved and include all of the test data, I am less concerned about removing old test data from the repo.

Additionally, a long-term approach to dealing with large test datasets is to move them to a place where they can be downloaded, such as release assets.

emthompson-usgs commented 1 year ago

I like the idea of moving the test data somewhere else, I just don't know how to do that.