google / recursive-version-control-system

Apache License 2.0
25 stars 7 forks source link

Add a command to cleanup the `.rvcs/archive` directory #2

Open ojarjur opened 2 years ago

ojarjur commented 2 years ago

Expected Behavior

The user's ~/.rvcs/archive directory only maintains the history that the user still cares about, and does not require an excessive amount of additional storage versus the user's actual file contents.

Actual Behavior

The user's ~/.rvcs/archive directory grows without bounds and contains full copies of every version of every file tracked.

Steps to Reproduce the Problem

To reproduce the issue of the directory growing without bound:

  1. Create a temporary directory
  2. Create a file within that directory
  3. Snapshot that file (but not the containing directory)
  4. Delete that file
  5. Snapshot the (now empty) temporary directory
  6. The snapshot of the now-deleted file will not be reachable from the snapshot of the directory, but it will also never be garbage collected.

To reproduce the issue of a full copy of every version of every file being retained:

  1. Create a temporary file
  2. Add some text to it
  3. Snapshot the file
  4. Add a single character to the end of the file
  5. Snapshot the file again
  6. There will be two separate entries under ~/.rvcs/archive/objects/ for the contents of the two different versions of the file which differ by only one character.

One way that we could fix both issues would be to add something like a rvcs cleanup command that would do the following:

  1. Traverse all of the paths under ~/.rvcs/archive/paths/, and find all of the objects reachable from those paths
  2. Add all of those reachable objects to a zip file stored under the ~/.rvcs/archive/ directory
  3. Remove the now-zipped objects from the ~/.rvcs/archive/objects/ directory
  4. Remove any objects from ~/.rvcs/archive/objects/ that were not reachable from any of the paths
  5. Similarly remove any entries from ~/.rvcs/archive/paths/ and ~/.rvcs/archive/cache that do not correspond to any currently mapped paths

For the last two steps, we would need to take extra care to make sure that we do not have any race conditions with simultaneous runs of the rvcs snapshot or rvcs merge commands.

Additionally, we would need to extend the tool to be able to read objects from the zip file under the ~/.rvcs/archive/ directory if they are not in the ~/.rvcs/archive/objects/ directory.

jootd commented 1 year ago
  1. Add all of those reachable objects to a zip file stored under the ~/.rvcs/archive/ directory

only files need to be zipped and not directories , am I right ?

ojarjur commented 11 months ago

@jootd only files need to be zipped and not directories , am I right ?

I'm not sure what you mean; all objects that are transitively reachable need to be zipped. Those objects might represent individual files, or they might represent a directory.

The folder structure of the archive does not necessarily need to be maintained, but the objects within the archive do (regardless of whether that object represents a regular file or a directory)