ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.16k stars 3.01k forks source link

`ipfs dag stat` improvements #3955

Open matthewrobertbell opened 7 years ago

matthewrobertbell commented 7 years ago

Version information:

go-ipfs version: 0.4.9- Repo version: 5 System version: amd64/darwin Golang version: go1.8.1

Type:

Enhancement

Severity:

Low

Description:

ipfs object stat doesn't seem to work on dag objects:

➜  ~ echo '{"hello": "world"}' | ipfs dag put
zdpuAtX7ZibcWdSKQwiDCkPjWwRvtcKCPku9H7LhgA4qJW4Wk
➜  ~ ipfs object stat zdpuAtX7ZibcWdSKQwiDCkPjWwRvtcKCPku9H7LhgA4qJW4Wk
NumLinks: 0
BlockSize: 0
LinksSize: 0
DataSize: 0
CumulativeSize: 0

I was interested on getting the individual block size, and total dag size, is a dag stat command planned?

Thanks :)

whyrusleeping commented 7 years ago

Ooooh, yeah. ipfs dag stat should probably exist. We could at a minimum include the size, number of links, maybe number of paths in the object, and print out some type information too (this is a cbor node, or this is a bitcoin block, etc)

matthewrobertbell commented 7 years ago

Would it be possible to also include what type the object is (hash/list/integer/string etc)?

whyrusleeping commented 7 years ago

@mattseh hrm... you mean the cbor type? That might be possible, but it would be awkward to apply that to other 'dag' types

matthewrobertbell commented 7 years ago

I mean knowing the difference between these:

 ~ echo '["a", "b"]' | ipfs dag put
zdpuAnPcrJDq4zxgHbrT1QZxjastWCs3U8bewnfPboBVwxEE8                                                                     

~ echo '{"a": 1, "b": 2}' | ipfs dag put
zdpuAnBAFDeLpjMFUN8C5DbHkAcLtNw9FvrPbBHgwbX1hfj1C

So list vs hash. I don't have a solid usecase for it, but it might be more elegant than having to pull the object then do type() (in python) on the returned JSON in some cases.

kevina commented 7 years ago

@whyrusleeping this sounds like a fairly easy thing for me to do. Let me know how deep you want to go (for example should I implement @mattseh suggestion) . Should we also have a mode that prints as much as possible based on the multi-hash alone, so the node doesn't have to be downloaded the node.

whyrusleeping commented 7 years ago

@mattseh hrm... i'm not so sure on that. Lets think through it a bit more.

@kevina Yeah, i would add most of the things above, but i'm not sure on the JSON type information bit.

kevina commented 7 years ago

@whyrusleeping, so what exactly will be the difference between object stat and dag stat? The description for object stat says "Get stats for the DAG node named by <key>" and in fact the description of ipfs dag says "This subcommand is currently an experimental feature, but it is intended to deprecate and replace the existing 'ipfs object' command moving forward".

it sounds like the object stat should be enhanced to give more information. Or am i missing something?

whyrusleeping commented 7 years ago

@kevina we don't want to add new functionality to the object subcommands. We are working on deprecating them in favor of the dag subcommands, the primary reason here is that making the object commands support all the dag stuff would require changing the apis. So for now, making ipfs dag stat do basically the same stuff as ipfs object stat is fine.

kevina commented 7 years ago

@whyrusleeping by bad, I misinterpreted the description of the ipfs dag

whyrusleeping commented 7 years ago

@kevina still interested in working on this one?

kevina commented 7 years ago

@whyrusleeping Sure, do you want me to make it a priority?

kevina commented 7 years ago

So the reason "object stat" is not returning anything meaningful for a "cbor" object is that it calls the Stat method of the Node interface which is unimplemented in the cbornode package. However, there is this note:

type Node interface {
    ...
    // TODO: not sure if stat deserves to stay
    Stat() (*NodeStat, error)
    ...
}

Do we want the new command dag stat to continue to use the Stat method, or do we want to do something else?

Also (of possibly unrelated note) there is a new utility cid-fmt as part of the go-cid package which can be used to get basic information on a CID such as the fact the the CID is a chor object. For example:

$ cid-fmt prefix zdpuAnPcrJDq4zxgHbrT1QZxjastWCs3U8bewnfPboBVwxEE8
cidv1-cbor-sha2-256-32
whyrusleeping commented 7 years ago

Hrm... we're planning on removing the Stat method from that interface entirely. I think @Stebalien will be best suited to figure out the best way forward here.

Also, on the cid-fmt tool, I wrote a similar thing a while back here: https://github.com/whyrusleeping/elcid We should probably combine our efforts :)

Stebalien commented 7 years ago

Yet another case where I wrote a response and then closed the window...

I agree with @whyrusleeping that we should remove Stat from the interface. For UnixFS, we'll probably want to add a replacement FSNode interface (which can provide this method along with other FSNode specific methods). For the general DAG, I'm not sure what info stat should return that's not already returned by ipfs block stat (although ipfs block stat should probably return the decoded CID).

Things we could consider including.

  1. Known inbound link count (number of nodes we have that link to this node).
  2. Number of outbound links.
  3. Number of children we have.
  4. Pinned status (direct, recursive, parents that keep it pinned etc).
  5. Whether or not we have the full recursive DAG rooted at this node.
  6. Total size of the part of the dag DAG rooted at this node that we have.

However, these are mostly gc/repo/pin related things, not properties of the node itself.

kevina commented 7 years ago

@Stebalien most of that information is currently not stored anywhere so will need to be computed, which is currently a very expensive operation. It may be useful for a stat command to output this information it should not be the default output.

Also what is the different from (2) and (3) does by (3) do you mean the number of children we have currently on the node?

Stebalien commented 7 years ago

most of that information is currently not stored anywhere so will need to be computed, which is currently a very expensive operation.

I know, I'm just throwing out suggestions for information that might be useful (assuming we cached it somewhere).

Also what is the different from (2) and (3) does by (3) do you mean the number of children we have currently on the node?

Yes.


Basically, I couldn't think of any direct IPLD specific stats that might be useful that aren't already supplied by ipfs block stat (other than extended CID information but that can be added to the block stat command) so threw out some ideas for useful data.

Really, we could just make ipfs object stat an alias for ipfs block stat.

aschmahmann commented 3 years ago

Note that ipfs dag stat was added in #7553 and supports calculating the full (deduplicated) size of the DAG and (deduplicated) number of blocks in it. Sure it could be expensive to calculate, but we can always add optimizations + caching of the calculations in the future.

Not closing this at the moment as many of the suggestions from https://github.com/ipfs/go-ipfs/issues/3955#issuecomment-328635834 are still valid. They could probably be split into a new issue and then this one closed.