Closed matheus23 closed 2 years ago
There's only so many ways to slice this problem, but just to get aligned:
The strategy is to add one node per level that can be forwarded in the skip ratchet. The "main" cryptree that you walk down is a single frozen version with pointers (CIDs) to individual children and their AES key. This includes everything that you'd expect from a file system: headers, metadata fields etc. Just not any bare namefilters or back links.
I'm going to represent trees rotated 90-degrees to make drawing in the time dimension easier. These are just my notes from the airplane.
To gain back versioning, there's a "temporal layer" above this, which contains the bare namefilters, and skip ratchet information. Essentially everything you need to move forward in time. Of course to do writes, you would need access to this layer, because you need to bump the skip ratchet in order to get to the next version.
Because skip ratchets can also grant limited control to ranges, it's possible to stack these in up to n-1
levels (where n
is the number of components in the skip ratchet). This is probably more work than its worth, especially on lazy reconciliation, but may be worth exploring.
This temporal control layer contains the following:
@matheus23 Thoughts? Did I miss anything?
There's a problem with this approach: We duplicate the "userland", i.e. the pointers into child nodes: Once for the temporal structure and a second time in the snapshot structure.
I'd propose just concatenating the two types of nodes and putting them into the same block. If we do that, we can remove the redundancy of the child links. So the encoding would be something like:
cbor([
encrypt(derive_key(ratchet), cbor({
bare_namefilter,
ratchet, // revision
})),
encrypt(hash(derive_key(ratchet)), cbor({
metadata,
inumber,
userland: {
// e.g.
"Photos": {
snapshot_key: photos_key,
revision_key: encrypt(derive_key(ratchet), photos_revision_key),
link: namefilter
}
}))
])
There's some wiggle room in terms of design: we could also nest the encryption, so instead of putting two encrypted blocks side-by-side, we could put one block inside of the other:
encrypt(hash(derive_key(ratchet)), cbor({
metadata,
inumber,
userland: {
// same as before
},
revision: encrypt(derive_key(ratchet), cbor({
bare_namefilter,
ratchet
}))
}))
I'm not sure what's better. There's minimal parallelism you gain from the first version (you can decrypt two ciphertexts in parallel), but at the same time, separating them means you leak info about which part you're modifying, if you change one of them. That may help cross-reference files? It seems safer to nest the encryption blocks for sure. I'm pretty sure this gets rid of the danger that co-locating these two ciphertexts brings.
There's definitely also the question of how much really brings us. We're not duplicating the "userland" (I don't like that name, maybe something like "directory entry table"?) anymore, but anytime we need to decrypt something there may be some inconsistency anyway (i.e. the key being incorrect), so we're getting rid of some of that possible inconsistency, but only a tiny bit of the overall.
encrypt(derive_key(ratchet), cbor({
bare_namefilter,
ratchet, // revision
})),
encrypt(hash(derive_key(ratchet)), cbor({
Yup this makes sense to me 👍 💯
putting them into the same block
Any specific reason why literally in the same block? Won't it get duplicated in cases where there's concurrent writes? The namefilter is "merely" a pointer into a K/V store. It will already have many leafs between the expanded namefilter, concurrent writes, etc.
inumber,
userland:
Any rationale for putting the inumber here instead of in the temporal header? You shouldn't need the inumber unless you're building a new version, right?
putting them into the same block
Any specific reason why literally in the same block? Won't it get duplicated in cases where there's concurrent writes? The namefilter is "merely" a pointer into a K/V store. It will already have many leafs between the expanded namefilter, concurrent writes, etc.
Hm, I'm not sure what you mean by "literally the same block". I just think they should be addressed using the same namefilter.
Thinking about your sentence "duplicated in cases where there's concurrent writes". I think I get what you're saying now. We could split the block into its header and the actual contents. I.e. something more along the lines of the first encoding in my post. :thinking: I'm liking the idea of not having to duplicate redundant information!
Any rationale for putting the inumber here instead of in the temporal header? You shouldn't need the inumber unless you're building a new version, right?
Yep! :smile:
Hm, I'm not sure what you mean by "literally the same block". I just think they should be addressed using the same namefilter.
@matheus23 You wrote "putting them into the same block", which I interpreted as "the same IPLD block".
🤔 It's possible that we need a way of distinguishing these things. How about "block" is an IPLD block, and "private file" is everything under the namefilter key, and "private file header" and "private file userland" are the header & body respectively. I'm open to different terms!
Yep! 😄 [...]
// TODO(matheus23): Inside or outside the revision section?
LOL do you mean that this is still to be decided, or...?
@matheus23 You wrote "putting them into the same block", which I interpreted as "the same IPLD block".
:thinking: It's possible that we need a way of distinguishing these things. How about "block" is an IPLD block, and "private file" is everything under the namefilter key, and "private file header" and "private file userland" are the header & body respectively. I'm open to different terms!
What's the reasoning behind the term "userland"? I never quite understood/liked it. Can we call it something like the "content section"? We'd use the term "private directory content section" + "private directory header"? Or maybe just shorter "private directory content"? Hmm maybe that clashes with content as a concept.
LOL do you mean that this is still to be decided, or...?
Yeah this was something I wanted to talk to you about, but that TODO is now done :P I'll remove it.
What's the reasoning behind the term "userland"?
@matheus23 It's the established term for the user-accessible region of a system. I believe the term originally comes from Unix. The non-user accessible / system managed part is called "kernel" or "kernel space" (I've never "kernel-land" for some reason, always "kernel space" or just "kernel").
This looks great! I had reconstructed this last time, but @matheus23 graciously walked me through it again, and it all makes sense 🎉 I'm going to add more clarifying text, but looks good to me!
Current state: When you share access to a private directory, you'll have the ratchet and be able to read all future versions, too.
Goal: You can either share "snapshot access" to a private directory, making it possible to only read everything the current directory contains & snapshots of all entries, or you can share "forward access" by providing the whole ratchet, so the recipient can read the current version and all future ones.