Open morgant opened 6 months ago
I found I had some WIP code using find ... links 1
as any unique file should only have a hard link from the parent directory, otherwise there are multiple links to the file. Unfortunately, in further testing of this theory, I didn't find it to actually work reliably.
So, I'm currently working on a new function to find all the normal files in the provided path (non-HFS+ volumes don't allow multiple hard links to the same directory, so only normal files are hard linked by rsync
), check to see if they exist in the previous machine backup (if not, they're automatically unique), and -- if they do -- compare the inodes of the file to the one in the previous backup to determine if they're unique or not. If they are unique, get the file size.
Unfortunately, it's not going to be very performant, but it should be no worse than doing a dry-run rsync
between backups to determine uniqueness. Can try to optimize later, if necessary.
I did some more testing with find ... -links 1
and I think it might have just been a summing issue. I've corrected my new calculate_backup_unique_size()
function to sum the size of all found files and it seems to work. I'm not sure if certain files may be missed, especially those in new directories, but this is a good start.
I renamed is_machine_directory()
to is_valid_machine_directory()
, plus implemented a new is_child_of_machine_directory()
function which the new calculate_backup_unique_size()
uses.
I did implement a new machine_backup_previous()
function which should identify a backup's prior backup, but it'll need some testing. It's not currently used, but could be utilized for the alternate implementation, if necessary. I'll comment that it's unused & untested before committing.
I'll put together a commit soon.
One of the prerequisites for pruning oldest backups to reclaim disk space on destinations will be determining how much disk space an individual backup will make available. macOS's
tmutil
provides auniquesize
verb for this, which I'll need to implement. The tmutil(8) manual describes it thusly:I'm not sure if there's an easy or performant way to determine which files in a backup are hardlinks and therefore non-unique. I might end up just having to use
rsync
to compare against the previous backup, just like I'm currently doing when actually performing a backup inbackup()
.