Closed JustinDrake closed 1 year ago
We are missing that spec. However, we have documentation in the several pieces of Unixfs, namely:
Let me know if that helps :)
Thanks @diasdavid.
Below are some specific questions I have:
filesize
of a link can easily be faked, right? What is the point of links having a filesize
if that filesize cannot be trusted?Excellent questions! Thank you, @JustinDrake :)
The filesize of a link can easily be faked, right? What is the point of links having a filesize if that filesize cannot be trusted?
Absolutely. It's convenience for things like stats, if it is application critic it should be verified externally.
Can a file with a super long file name be sharded? (I understand there's sharding for large directories and large files.)
Not part of the spec.
What are metadata links?
Not currently used. Designed for things like permissions.
It seems there are two ways of declaring a block "raw". The first is with the unixfs raw type. The second is with the raw multicodec as part of the CIDv1. Which takes precedence? How are conflicts resolved?
Both exist in different realms. Unixfs raw type is the unixfs protobuf with a type raw serialized and inserted into a dag-pb protobuf.
IPLD raw type is really just any array of bytes
In the case of a link, what does the data field hold?
What is the case of 'link'? A good way to understand the data struct is to add some files and directories and explore them using the ipfs object
or ipfs dag
API
Is DagCBOR implemented yet? Is there an IPFS flag to make use of it by default?
ipld-dag-cbor
is implemented https://github.com/ipld/js-ipld-dag-cbor
Unixfs uses ipld-dag-pb
. There is currently no plan of moving it to ipld-dag-cbor
unixfs requires names to be unique. What happens if they are not?
Per directory level. If there is a folder with 2 files using the same name, then that is an error. To make that happen you would have to manipulate the graphs directly.
Thanks 👍 Some follow-up questions:
data
field. Let's take an example with zdj7WYjg5Gek1VmesaAFnT7nzi15xhAYMt1yxBxDyQSNgG1gy
. The dag/get
endpoint returns CAE=
for the data. The object/get
endpiont returns \u0008\u0001
. Why are the returned data
fields different? What is the significance of CAE=
and \u0008\u0001
.CAE= is "\u0008\u0001" base64 encoded (otherwise known as [0x8, 0x1]). However, I'm not sure why we're using two different encoding schemes (IMO, both should return "\u0008\u0001" but there's probably a reason?).
In CIDv1, if the CID says it's a byte array, it will be interpreted as a byte array. In CIDv0, nodes are always interpreted as dag nodes. There are no "raw" CIDv0 nodes; there are just CIDv0 nodes with only a data field.
Protobufs generally don't use explicit versioning (as far as I'm aware). Instead, you just add more optional fields that only mean something to newer versions of the software. If you need to to introduce a backwards incompatible change, you'd can add a new datatype: that's how sharded directories (HAMTShard) was introduced.
@Stebalien Cheers :) My unixfs understanding is starting to crystalise 👍
add -r --cid-version 1
can I confirm that the CIDs can only be of type DagProtobuf (0x70) or Raw (0x55)?(we = protocol labs, not necessarily me)
So, DagCBOR is the canonical IPLD format that can encode every IPLD object (arbitrary {"foo": "bar", "baz": Qm...}
). DagProtobuf can only encode IPLD objects of the form { "data": bytes..., "links": [ {"name": ..., "size": ..., "link": Qm... } ] }
. So, the real question is: why DagProtobuf?
The answer is that IPFS (and these DagProtobuf objects) came before IPLD. However, while building IPFS, we realized that DagProtobuf was hard to work with. To structure data, you have to serialize it and embed the structured data in the data field as an array of bytes. Worse, this structured data can't actually link to other objects because links must go in the links section (so the data section needs to reference links in the links section to actually link to other objects). So, we made IPLD to make storing structured data easier. Now, why did we keep DagProtobuf? The answer is simply backwards compatibility.
So, what is the use-case for DagCBOR: building other applications (and, potentially, extending IPFS). It's what we would have used to build IPFS if we could start over.
Continued in https://github.com/ipfs/specs/issues/316
The unixfs spec is completely missing https://github.com/ipfs/specs/tree/master/unixfs
Can I find rough notes somewhere specifying unixfs?