Implement `uniquesize` verb

morgant / tempus-machina

A macOS (nee OS X) Time Machine work-alike powered by rsync

MIT License

1 stars 0 forks source link

Implement `uniquesize` verb #4

Open morgant opened 6 months ago

morgant commented 6 months ago

One of the prerequisites for pruning oldest backups to reclaim disk space on destinations will be determining how much disk space an individual backup will make available. macOS's tmutil provides a uniquesize verb for this, which I'll need to implement. The tmutil(8) manual describes it thusly:

uniquesize path ...
             Analyze the specified path in an HFS+ backup or path to an APFS backup and determine its
             unique size. The figure reported by uniquesize represents things that only exist in the
             specified path; things that are present in other backups are not tallied.

I'm not sure if there's an easy or performant way to determine which files in a backup are hardlinks and therefore non-unique. I might end up just having to use rsync to compare against the previous backup, just like I'm currently doing when actually performing a backup in backup().

morgant commented 1 month ago

I found I had some WIP code using find ... links 1 as any unique file should only have a hard link from the parent directory, otherwise there are multiple links to the file. Unfortunately, in further testing of this theory, I didn't find it to actually work reliably.

So, I'm currently working on a new function to find all the normal files in the provided path (non-HFS+ volumes don't allow multiple hard links to the same directory, so only normal files are hard linked by rsync), check to see if they exist in the previous machine backup (if not, they're automatically unique), and -- if they do -- compare the inodes of the file to the one in the previous backup to determine if they're unique or not. If they are unique, get the file size.

Unfortunately, it's not going to be very performant, but it should be no worse than doing a dry-run rsync between backups to determine uniqueness. Can try to optimize later, if necessary.

morgant commented 16 hours ago

I did some more testing with find ... -links 1 and I think it might have just been a summing issue. I've corrected my new calculate_backup_unique_size() function to sum the size of all found files and it seems to work. I'm not sure if certain files may be missed, especially those in new directories, but this is a good start.

I renamed is_machine_directory() to is_valid_machine_directory(), plus implemented a new is_child_of_machine_directory() function which the new calculate_backup_unique_size() uses.

I did implement a new machine_backup_previous() function which should identify a backup's prior backup, but it'll need some testing. It's not currently used, but could be utilized for the alternate implementation, if necessary. I'll comment that it's unused & untested before committing.

I'll put together a commit soon.