Closed kevina closed 1 year ago
+1 to both file size only (not "physical" size) and +1 to dirent as a count.
I've always been completely bewildered by ls
reporting dirs as 4k
(the "physical" size -- which I've precisely never cared about). Can't imagine there's much use for physical size reports on files either; the only context I can imagine is if generating a report on the overall physical size use of an IPFS repo, and that would need to report non-unixfs objects as well, making a special inclusion of that in unixfs objects redundant at best.
I would still include both the physical size and the file size. I would not include directory sizes.
I'm trying to understand the use case for knowing the size of all the nodes and not just the data.
I wouldn't want to trust this kind of information for managing quotas since it's not a guarantee.
For space usage it's also not entirely accurate. There's no guarantee that because I have one of these blocks that I've succeeded in also storing the rest of the graph.
Have the content size of each file, and the cumulative size of all the files in each directory, is enough to shows download progress.
If we adopt the file-data
format we could even get away with not including the size
attribute since you can easily figure this out by looking at the data
array.
Can the directory size as the sum of all the directory entry sizes be included as well?
In v1 We can't calculate directory sizes without traversing all children of the node as it may be a HAMT shard so is out of the question, but we can't create the directory unless we know which files are in it so we do have the directory size at creation time. Seems weird to throw that information away.
I would not include directory sizes.
@Stebalien could you expand on why not?
@achingbrain that’s actually how it works now :) https://github.com/ipfs/unixfs-v2/blob/master/SPEC.md#ipld-dir
The size
of a directory is the sum of the size
of all the size
properties in data
, so that includes the size of files and sub-directories.
However, this is the cumulative size of file “data” and not the size of the blocks. We got rid of that information because it doesn’t really work well in this new model where the block boundaries are transparent.
Also, as @warpfork reminded me today, we need to call out in the spec that while implementations of unixfsv2 MUST encode this accurate, readers of this data should consider the property advisory since there is no way to guarantee it is accurate without parsing the entire graph.
Hooray!
V1 DAGLink
sizes have been similarly untrustworthy since forever.
closing for archival
Th old unixfs has two sizes the file size which and the total size of the protocol-wrapped objects (the physical size). The same sizes where used for directory entries except perhaps not for sharded directories (see #7).
The question is are both sizes still useful to include? Based on some discussion on #2 I think maybe we should simplify things and just have the file size. If the any use for the physical size even used anywhere?
In addition, the file size isn't really useful for directories. A better size to include would be a count on the number of entries. This count will also allow seeking of shared directories (see #6).
Thoughts?