Open Forza-tng opened 2 years ago
Any of the numbers other than shared extents are easy to get; shared extents would require two passes as we don't know yet about other files that are yet to be processed.
On the other hand, designing and implementing a reasonable interface is not trivial.
Any of the numbers other than shared extents are easy to get; shared extents would require two passes as we don't know yet about other files that are yet to be processed.
On the other hand, designing and implementing a reasonable interface is not trivial.
Don't we get shared extents today, at least within the search path? (the referenced vs actual usage)
# compsize home/
Processed 11696 files, 5763 regular extents (21001 refs), 1134 inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 56% 89M 158M 471M
none 100% 58M 58M 175M
zstd 30% 30M 99M 296M
"Referenced" counts the number of times each reference is seen in the files named on the command line. It doesn't tell you whether the underlying extents are shared.
It doesn't handle hard links, and merely repeating a file name gets all of its references "shared":
# compsize foo
Processed 1 file, 1 regular extents (1 refs), 0 inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 100% 9.6M 9.6M 9.6M
none 100% 9.6M 9.6M 9.6M
# compsize foo foo
Processed 2 files, 1 regular extents (2 refs), 0 inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 100% 9.6M 9.6M 19M
none 100% 9.6M 9.6M 19M
An extent will appear to be unshared if you didn't provide files containing all of the references on the command line:
# cp --reflink=always foo bar
# compsize foo
Processed 1 file, 1 regular extents (1 refs), 0 inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 100% 9.6M 9.6M 9.6M
none 100% 9.6M 9.6M 9.6M
# compsize foo bar
Processed 2 files, 1 regular extents (2 refs), 0 inline.
Type Perc Disk Usage Uncompressed Referenced
TOTAL 100% 9.6M 9.6M 19M
none 100% 9.6M 9.6M 19M
If we want to know whether an extent is shared in general, we have to look at the backrefs and reference counter for the extent in the extent tree (if it's >1 then we know immediately it is shared), then work backwards up the subvol tree to see if there are multiple roots in its ancestry (we can stop as soon as we find a second parent). The first step is an easy TREE_SEARCH on the extent tree. The second step is expensive: the choices are to use the LOGICAL_INO
ioctl, which will find every reference, so it's more work than needed to calcuate shared/not-shared; or read the block device directly, which adds significant extra complexity and some race conditions that will need to be handled somehow (accept lower accuracy or run a retry loop).
That said, running LOGICAL_INO
on each unique extent would do the job, it will just take more time and IO than compsize
would normally use.
I'd like an option to use compsize to find files with extents matching some criteria and list them.
compsize --find
could have the following matches:It should be possible to combine several matches.
The output can be a table with the path/filename + the chosen matches.
A possibly to sort the output would be great, or have an output format that can be piped to
sort
.My initial use-case is to find highly fragmented files so that I can manually defrag them. But also to analyse my files and how they are to determine if I should do some action on them.