Closed schomatis closed 6 years ago
So Directory is only the "plain implementation" type of a directory, it's not the directory type (a very similar confusion arises with the File and Raw types)
Yes. Sharded directories came later so they got a different type.
For example, looking for common characteristics of the plain and the HAMT (which I mostly do not understand) implementations I'm observing that its children are still referenced as DAG links (HAMT contains a map for a quick access to them but the links remain at the DAG layer).
So, all links must go in the Links
section of the PBNode. This is just a restriction of the PBNode IPLD format (and the primary motivation to move away from it).
However, sharded directories span multiple nodes. Therefore, for better performance, we cache these shards when we load them. That's why we have the children map.
However, sharded directories span multiple nodes.
Oh, could you point me to that part of the code? I thought a shard was encoded in a single Data
of a DAG node,
So, while working on https://github.com/ipfs/go-ipfs/pull/5160 I realized that MFS is the single consumer of the UnixFS directory abstraction, the definition of the interface is basically what the MFS layer needs from UnixFS.
@Stebalien
So, all links must go in the Links section of the PBNode. This is just a restriction of the PBNode IPLD format
I'm not sure what kind of restriction is that. I could just save the links in the Data
field, as long as the UnixFS methods called by MFS know how to operate on it I don't need to use the links of the DAG layer, I should, of course, but as the API is just that set of functions defined in the dirbuilder.go
file, I don't see any requirement (and there is also no spec) that actually indicates that. That's what I'm mean with the question of this issue, it seems to me that a UnixFS directory is whatever UnixFS wants it to be as long as it delivers to MFS, and that mental model is hard to transmit to a new user, that's why I want to have a concrete list of "things a UnixFS directory should comply with" (besides a set of methods), to be able to provide more information to the new user.
Oh, could you point me to that part of the code? I thought a shard was encoded in a single Data of a DAG node,
Take a look at getChild
and loadChild
(on Shard).
I'm not sure what kind of restriction is that. I could just save the links in the Data field, as long as the UnixFS methods called by MFS know how to operate on it I don't need to use the links of the DAG layer, I should, of course, but as the API is just that set of functions defined in the dirbuilder.go file, I don't see any requirement (and there is also no spec) that actually indicates that.
You could. However, tools like pin
, refs
, etc. wouldn't work properly.
That's what I'm mean with the question of this issue, it seems to me that a UnixFS directory is whatever UnixFS wants it to be as long as it delivers to MFS, and that mental model is hard to transmit to a new user, that's why I want to have a concrete list of "things a UnixFS directory should comply with" (besides a set of methods), to be able to provide more information to the new user.
So, there are multiple types of unixfs directories. At the end of the day, they should all expose a standard interface (really, iterate and lookup). However, we're always going to have multiple very different representations. Currently, we have:
We actually have the same split in files. We have files that are raw bytes and sharded files.
So, for context, HAMT stands for hash array mapped trie. To insert a key/value (name/file) into a HAMT, you first hash the key. Then, roughly,
However, instead of storing sparsely populated arrays as suggested by this algorithm, we use a bitfield. To lookup a value in the array, you:
Extra :+1: for the detailed HAMT explanation.
This issue has been addressed in a couple of PRs now (the last one being https://github.com/ipfs/go-unixfs/pull/16) we can close it.
It would be great to add this information on how the UnixFS HAMT works to the UnixFS spec which currently doesn't have any info on how the HAMTShards are actually encoded and supposed to be used. This is the best explanation here that I found when googling for a spec of the UnixFS HAMT.
The
Directory
type of a UnixFS object (contrary to its name) doesn't actually indicate a directory, or at least it's not the only possible type of a directory, theHAMTShard
type also indicates a directory, leading to confusing comparisons like:https://github.com/ipfs/go-ipfs/blob/ecf7d157a6d5e525b122079367f4b6c2ba25e951/mfs/dir.go#L159-L161
So
Directory
is only the "plain implementation" type of a directory, it's not the directory type (a very similar confusion arises with theFile
andRaw
types). What are the requisites of a UnixFS directory? It seems that there is no interface that would define it, at this point the closest that can be find is theDirectory
from theunixfs.io
package,https://github.com/ipfs/go-ipfs/blob/ecf7d157a6d5e525b122079367f4b6c2ba25e951/unixfs/io/dirbuilder.go#L28-L36
but this structure emerged just to accommodate the HAMT implementation (https://github.com/ipfs/go-ipfs/pull/3042, it evolved from a previous
directoryBuilder
structure), not as a clear and documented definition of a directory.The biggest consequence of all this (IMO) is that a new reader needs to go through the
*hamt.Shard
structure pointer to understand what's happening when content is added to a directory (with the major danger of diving in the really complex code of thehamt
package).So the main objective is to provide the user a clear explanation of what it can expect from a directory, possibly presenting a documented interface, while also using it to hide as much as possible the HAMT directory variant, to avoid confusing code lines when the user is following the execution path of how are files added to a directory in the MFS hierarchy,
https://github.com/ipfs/go-ipfs/blob/ecf7d157a6d5e525b122079367f4b6c2ba25e951/unixfs/io/dirbuilder.go#L101-L107
For example, looking for common characteristics of the plain and the HAMT (which I mostly do not understand) implementations I'm observing that its children are still referenced as DAG links (HAMT contains a map for a quick access to them but the links remain at the DAG layer); on the contrary, whereas the plain implementation does not hold information in the
Data
field the HAMT stores the bit field there.