n0-computer / iroh.computer

iroh website
https://iroh.computer
Other
11 stars 20 forks source link

docs request: relationship between docs & collections #60

Open ramfox opened 1 year ago

ramfox commented 1 year ago

spawned from a discussion around clarity on the relationship between docs & collections:

ramfox

I have some questions about the import / export PR, most of them right now are about the relationship between what our current blob related code expects (to add a file & wrap that blob in a collection) and what the document expects (just a blob)

It feels like we have two paradigms right now, the sync/document paradigm and the transfer/collection paradigm. We use transfer inside sync, but do we ever want to use collections inside a document? When we import a file, currently the document expects the entry to be a blob of bytes, but right now the only API we have to get a file into the store is BlobsClient::add_from_path, which will wrap the blob in a collection to preserve the filename. It's trivial to wrap this in a function that goes and gets the hash of the actual raw blob, but it made me think about the relationship between documents and collections, in general.

And if, on a command like doc import, we want to even keeping this "wrap in a collection when we add to the store" behaviour.

Can you mix it up? Can a document entry sometimes point to a blob and sometimes point to a collection? If both, how are we supposed to know what it contains?

Is there any advantage to using a collection as the primary "datatype" of a document entry, in general?

b5

This has been discussed a bunch on the margins. At the moment, we don't encourage collections as values in documents, and don't have any APIs that allow you to leverage a collection in a document. This is on purpose: why complicate things until we have a reason to?

I'd argue that in this context documents are "collections++": they're a way to group blobs together. Because they serve the same purpose, I don't think we should mix them. Instead I think the lack of a direct "put raw blob" API is missing. iiuc frando is already working on adding it: https://github.com/n0-computer/iroh/pull/1518

When we import a file into a document, my expectation is those raw blobs are added directly as entries to the document using --prefix + the file path as the key. I don't think any collections should be created by doc import.

your sense that we have two different paradigms is correct, and that's on purpose. We've kept collections because they can do one thing that documents cannot: they can give you a fixed hash of the collection contents. There are very real use cases for this