ipfs / kubo

An IPFS implementation in Go
https://docs.ipfs.tech/how-to/command-line-quick-start/
Other
16.18k stars 3.01k forks source link

Terminology: MFS vs UnixFS vs Files API #5051

Open schomatis opened 6 years ago

schomatis commented 6 years ago

I'm failing to find a clear distinction between those terms (the most clear explanation I've found is in https://github.com/ipfs/js-ipfs/issues/60).

Intuitively from what I've seen in issues and reading the code, the unixfs package implements the files format to use in IPLD nodes, the mfs package organizes those nodes/files in a hierarchical system similar to the Unix file system and the Files API (that for me is pretty much just the core/commands/files.go file in the commands package, is there a Core API for this?) exposes that functionality to the user through the ipfs files command set.

I would like to arrive to more formal definitions of those terms that would be backed by clear delimitations in the code base (and how those components interact).

whyrusleeping commented 6 years ago

Yeah, your descriptions are correct. Unixfs is a format. Mfs is the virtual filesystem tree, and the files api is an api interface that gives you filesystem operations over unixfs files/directories backed by mfs.

Mr0grog commented 6 years ago

the unixfs package implements the files format to use in IPLD nodes

One small nit: the current UnixFS format is not compatible with IPLD (there are discussions about designing a v2 that is IPLD-comptible in ipfs/unixfs-v2). Either way, though, it is the format for the nodes of a merkle DAG representing a file on IPFS 👍

schomatis commented 6 years ago

@Mr0grog Thanks for the clarification and the reference, that may explain why I was finding that most functions in charge of formatting the UnixFS object inside the nodes were referring to dag.ProtoNode instead of ipld.Node (https://github.com/ipfs/go-ipfs/issues/5059#issuecomment-393970405).

whyrusleeping commented 6 years ago

yeah, the current unixfs nodes are a subset of ipld, (the unixfs dag nodes are still ipld objects) but most of the data in them is not accessible in standard ipld contexts. (i can't path traverse these nodes to get filesize, for example)

Mr0grog commented 6 years ago

the current unixfs nodes are a subset of ipld, (the unixfs dag nodes are still ipld objects)

Hmmm, I think we’ve reached the limit of my understanding, but I thought IPLD was specifically a CBOR-encoded blob where links are identified by values like {"/": "<any valid CID>"}, while UnixFS nodes are specifically a protobuf-encoded blob (with Links and Data fields, where Data is another protobuf-encoded blob matching this definition).

Am I misunderstanding, or are you just using IPLD more loosely to indicate a node in a DAG where the links are hashes (i.e. any kind of merkle DAG, not necessarily a MerkleDAG spec merkle DAG)?

whyrusleeping commented 6 years ago

but I thought IPLD was specifically a CBOR-encoded blob where links are identified by values like ...

Yeah, this is the part that i find confusing too (i'm looking at you @jbenet). IPLD technically refers to really any content addressable data with links to other content addressable data. CBOR-IPLD is our default implementation that works as you describe. Under that definition of ipld, unixfs is a subset. see: https://ipld.io/

Mr0grog commented 6 years ago

Oh, wow, that makes it really hard to talk about clearly. It also explains a lot of the confusion I also went through trying to understand it myself. Thanks for clearing that up (insofar as it can be clear).

schomatis commented 6 years ago

Oh, I'm just reading this, I think this discussion partly answers my question in https://github.com/ipfs/go-ipfs/issues/5058#issuecomment-394204926. Still, I need some more time to process this information, I'm too confused right now to write any coherent follow-up question.