ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.04k stars 3k forks source link

Docs are inaccurate for `dag get` #5363

Open Mr0grog opened 6 years ago

Mr0grog commented 6 years ago

Our docs for the dag get command are not entirely accurate and invite some misconceptions: https://github.com/ipfs/go-ipfs/blob/3296412d19d5a7b565ece21f5d7c275c41bcf246/core/commands/dag/dag.go#L186-L192

In particular, it doesn’t really “get a DAG node” so much as it gets the value at an IPLD path. For example:

> curl 'http://localhost:5001/api/v0/dag/get?arg=zdpuAnhvECCigEM92DCZVwFguAHTSg1yJcPLyPdsJsVTYRuXB/title'
# "A simple example of an IPLD node"

I guess you could sort of say that’s technically a node — kind of — but I think most people would interpret the node in the above example to be the data at:

/ipfs/zdpuAnhvECCigEM92DCZVwFguAHTSg1yJcPLyPdsJsVTYRuXB

…not the string at:

/ipfs/zdpuAnhvECCigEM92DCZVwFguAHTSg1yJcPLyPdsJsVTYRuXB/title

What’s a better way to explain this?

See also this discussion: https://github.com/ipld/ipld/issues/44#issuecomment-411873611. /cc @mikeal @diasdavid @whyrusleeping

Once we figure out the right language, this also needs updating in interface-ipfs-core, in js-ipfs’s help text, and in the docs.ipfs.io site (which is generated from the code here).

whyrusleeping commented 6 years ago

@Mr0grog in a way, everything is a node. It just depends on the layer at which you view a graph. For example, in the unixfs view, a 'directory' is a single conceptual object, comprised of potentially many smaller objects at the dag layer.

I'm not against picking a better term than dag node, but just wanted to give some background as to why we call anything we can resolve a 'node'.

Mr0grog commented 6 years ago

It just depends on the layer at which you view a graph.

Ok, so this is why I said “you could sort of say that’s technically a node — kind of.” I completely get what you’re saying here.

BUT I also want to make sure we’re on the same page: I fully agree on a sharded directory being multiple nodes. But what I was trying to get here was particularly the situation where ipfs dag get returns a thing that is not a node on a merkle DAG — it’s an attribute of a hash-addressed node, not a hash-addressed node itself (sure, this is could still be a node on some arbitrary conceptual graph, but it’s not a hash-addressed item in IPFS). Maybe another example would be:

> ipfs dag get /ipfs/<cid of a git commit>/author/email

Does that make sense? Are we talking about the same thing?

schomatis commented 6 years ago

@Mr0grog My take on this (but I'm no authority on the subject):

But what I was trying to get here was particularly the situation where ipfs dag get returns a thing that is not a node on a merkle DAG — it’s an attribute of a hash-addressed node,

What you're addressing by hash it's a block of data (a layer below DAG) which happens to be formatted as a DAG node, (taking the example we were discussing in another issue) it's the equivalent of me providing you a path in the file system for a file that is formatted with the PNG format, i.e., an image, you renderer know that's an image, me file system only know it's a bit stream, I only know it's address and how to retrieve it, what you do with that it's not my problem.

not a hash-addressed node itself (sure, this is could still be a node on some arbitrary conceptual graph, but it’s not a hash-addressed item in IPFS).

Everything is a hash-addressed item in the sense that you can request the block service the block of data with that CID (and that layer is responsible to check its own datastore or request it to the network). For example:

# I add my `go-ipfs` dir.
ipfs add . -r
# [...]
# added QmTWmw7Jcf8xjBAZSgsUp95g5FtQxFsGmU1zJs9yBeb2UV go-ipfs/core/commands/dag/dag.go
# [...]
# added QmeitChRGMiBm1TZv79VZTG25Jgxm2xFFxSw9EbBYpDvpi go-ipfs

# I retrieve the root DAG node that represents (at the UnixFS layer) the `go-ipfs`
# directory. (This can be a single node or the root of many nodes, e.g., sharded dir.)
ipfs dag get QmeitChRGMiBm1TZv79VZTG25Jgxm2xFFxSw9EbBYpDvpi # go-ipfs
# {"data":"CAE=","links":[{"Name":"CHANGELOG.md","Size":137102,"Cid":{"/":"QmSUnnmthZinz51Exu2ToKRftS9rzmDmUCEDD4L554zN1N"}},{"Name":"CODEOWNERS","Size":154,"Cid":{" [...]

# I can now get the `dag.go` file addressing the root `go-ipfs` DAG node (through
# the hash of the block that stores it) and then use the DAG layer logic to traverse
# the DAG nodes until the `dag.go` node (which may or may not contain the entire file,
# the DAG layer doesn't know this, only the UnixFS does).
ipfs dag get QmeitChRGMiBm1TZv79VZTG25Jgxm2xFFxSw9EbBYpDvpi/core/commands/dag/dag.go
# {"data":"CAISlzhwYWNrYWdlIGRhZ2NtZAoKaW1wb3J0ICgKCSJieXRlcyIKCSJmbXQiCgkiaW8iCgkibWF0aCIKCSJzdH [...]

# If I know the hash of the block that stores that very same `dag.go` node (should
# have used a different filename, sorry), e.g., `ipfs add` told me, I can just tell
# the block service to get me that and feed it to the DAG layer.
ipfs dag get QmTWmw7Jcf8xjBAZSgsUp95g5FtQxFsGmU1zJs9yBeb2UV # dag.go
# {"data":"CAISlzhwYWNrYWdlIGRhZ2NtZAoKaW1wb3J0ICgKCSJieXRlcyIKCSJmbXQiCgkiaW8iCgkibWF0aCIKCSJzdHJpbmd [...]
# (Same data.)
# 
# The DAG layer doesn't know the path of `dag.go`, it doesn't even know that that node
# is a file (you couldn't seek to the middle of the file DAG), the slashes are just
# hints to the DAG layer that the node has a link (child) with that name.

~> @Mr0grog in a way, everything is a node. It just depends on the layer at which you view a graph. For example, in the unixfs view, a 'directory' is a single conceptual object, comprised of potentially many smaller objects at the dag layer.~

~Yes, this is something that we've discussed in other issues, I think it's really important to clearly define a language for that. My personal issue with the "everything is a node" terminology, coming from a communications background where every package (PDU) has a different name in each layer (which reminds the reader we're in a different "zoom" on the model, that is, things work differently here than at other layers) is that it's a rather ambiguous term, especially for a new reader.~

~(More boring personal opinions coming.) If we want to hold on to the ubiquitous node then we should be very careful to always use the correct qualifying term, e.g., DAG or UnixFS, otherwise a very confusing model is being set up for the reader. This is especially so in the DAG-UnixFS-MFS interaction (and I imagine there are similar situations at different layers) where I actually can interact from UnixFS/MFS layers with a DAG node to get all the information I want from the file (the same way I may only need the first IP datagram to get the information I need from the TCP communication) but I'm actually interpreting it from the higher layer perspective, the fact that all that information is being encapsulated in the first DAG node shouldn't be the focus (although at some parts of the code base it is).~

(edit: off-topic)

Mr0grog commented 6 years ago

Everything is a hash-addressed item in the sense that you can request the block service the block of data with that CID (and that layer is responsible to check its own datastore or request it to the network).

I think we’re talking past each other, @schomatis. In your example, I’m talking about a command like:

> ipfs dag get /ipfs/QmeitChRGMiBm1TZv79VZTG25Jgxm2xFFxSw9EbBYpDvpi/links/0/Name
# "CHANGELOG.md"

There is no address like /ipfs/<some cid> that the above address resolves to in this case. It’s not a piece of hash-addressed data (on IPFS, at least); it’s only a subset of the data in the DAG node. It’s not a block; it’s just a piece of a block.

That said, the above example doesn’t work because the way path resolution happens for UnixFS/dag-pb is unique compared to other resolvers — that’s why I used a CBOR example. If you get the whole “node” at http://localhost:5001/api/v0/dag/get?arg=zdpuAnhvECCigEM92DCZVwFguAHTSg1yJcPLyPdsJsVTYRuXB, you’ll have a JSON representation of the CBOR data like:

{
  "child": {
    "/": "zdpuAvQ329wk1WAKondXzp9LcLFrxHrzRnyPivMfC4THfHYj7"
  },
  "title": "A simple example of an IPLD node"
}

But if you go to http://localhost:5001/api/v0/dag/get?arg=zdpuAnhvECCigEM92DCZVwFguAHTSg1yJcPLyPdsJsVTYRuXB/title, you’ll just get the title attribute, which doesn’t involve any links, and is not a hash-addressed item on IPFS:

"A simple example of an IPLD node"

^ This is what I mean when I say technically it’s a node on some graph, but it’s not a node on the hash-addressed graph rooted at /ipfs/zdpuAnhvECCigEM92DCZVwFguAHTSg1yJcPLyPdsJsVTYRuXB — we did not traverse a link that pointed directly at the value "A simple example of an IPLD node". I don’t think most people would consider "A simple example of an IPLD node" to be a “dag node” in IPFS, even though the docs say this command returns “dag nodes.”

Mr0grog commented 6 years ago

My personal issue with the "everything is a node" terminology

I think there’s probably lots to be said here, but I think it’s probably off-topic for what I’m trying to address in this issue ;)

schomatis commented 6 years ago

Oh, I see, thanks for the clarification @Mr0grog, sorry for the noise.

Mr0grog commented 6 years ago

@schomatis no worries, hopefully it helped me restate and clarify for others.

@whyrusleeping, does what I’m trying to get at here make more sense now?

ec1oud commented 3 years ago

'ipfs dag get' fetches a dag node from ipfs and prints it out in the specified format.

OK how to specify the format then? I want to have cbor as one of the available formats, not only json.