ropensci / gittargets

Data version control for reproducible analysis pipelines in R with {targets}.
https://docs.ropensci.org/gittargets/
Other
87 stars 1 forks source link

Function to remove snapshots #9

Closed wlandau closed 2 years ago

wlandau commented 2 years ago

Prework

Proposal

Suggested by @smwindecker in her rOpenSci review. Plan:

  1. Make all git snapshots orphan branches. History will no longer be shared among branches, but there is no loss of efficiency for fully up-to-date files. Efficiency from diffs may be lost, but diffs in target storage may be impossible anyway, I will need to check.
  2. Remove commits with no ref: https://stackoverflow.com/questions/3765234/listing-and-deleting-git-commits-that-are-under-no-branch-dangling
wlandau commented 2 years ago

From https://git-scm.com/book/id/v2/Kostumisasi-Git-Git-Attributes, it looks like file diffs rely on filters. Gittargets already uses the lfs filter, which seems to preclude diffs, so orphan branches are probably fine.

wlandau commented 2 years ago

On second thought, orphan branches lose valuable information about the audit trail of which targets changed between code commits. From the perspective of tracking history, I think this is more valuable than the ability to delete data snapshots. Might revisit this issue at some point.

pvtodorov commented 2 years ago

Came across this issue looking for a way to delete entire snapshots. Is there a way to do this yet?

wlandau commented 2 years ago

Unfortunately no, it gets messy because Git tracks differences between one commit and the next. Each successive snapshot depends on the previous one for storage efficiency.