ipld / specs

Content-addressed, authenticated, immutable data structures
Other
592 stars 108 forks source link

codec: AES encrypted blocks #349

Open mikeal opened 3 years ago

mikeal commented 3 years ago

Here’s an initial spec for the AES codec work I’ve done https://github.com/multiformats/js-multiformats/pull/59/files

mikeal commented 3 years ago

There’s a discussion that started in the js-multiformats implementation that we should move to this spec https://github.com/multiformats/js-multiformats/pull/59#issuecomment-759615163

Should we drop the CID length and just parse the CID out of the blocking by parsing through the varints? It would require some additional parsing rules and complicate things but it would also shave 4 bytes off of every block.

rvagg commented 3 years ago

Created https://github.com/multiformats/js-multiformats/pull/60 to show what it could be like without the length.

mikeal commented 3 years ago

There’s actually only byes and a map that come out of the codec. The length is never surfaced, that’s just part of the block format.

rvagg commented 3 years ago

This codec only "supports" Bytes, nothing else, in the same way that dag-pb only supports Bytes and Links. The List, Map, etc. are only artifacts of how it decodes into the Data Model, they can't be used to encode any other forms.

vmx commented 3 years ago

Thanks @rvagg, I know get the point that this codec cannot encode any arbitrary Data Model Map. The use of Schemas here confused me at first, but pointing to DAG-PB made me realize that we do the exact same thing there too.

JonasKruckenberg commented 3 years ago

I wanted to give some feedback on this too, as it parallels a lot of the work we've been doing with dag-cose (and things that dag-jose already addresses kinda)

What I like about this proposal

I really like how simple this is, it's a nice low level primitive to built more complex structures upon. I also like that this does not add a lot of overhead (both processing and storage wise) to the block. Something that worries me with dag-cose.

What I don't like about this proposal

My main concern is, that this proposal is way too easy to misuse by developers who don't know better. This codec offloads all of the actually security relevant decisions to the application, while I get that there are good reasons for processing the block at the application this also pushes ALL the responsibility to userland. So in short what does this codec offer that I can't already archive with and identity codec?

My seconds concern is that this is basically reinventing the wheel, we already have battle tested standards such as JOSE and COSE that cover the same area these codec are covering.

That said, please don't feel offended, this proposal is definitely a step in the right direction, I just think this isn't our holy grail quite yet. I think we're on to something with this proposal though!

mikeal commented 3 years ago

this codec offloads all of the actually security relevant decisions to the application

I don’t quite understand the concern here. What IPLD currently offers for encryption is nothing. Everyone doing encryption is doing it in the application layer above IPLD. What this spec does is offer very small primitives to help those projects along without changing the layer model of IPLD or forcing a particular encryption workflow on IPLD (which just wouldn’t work).

Just looking at where IPLD lives in the stack, it’s hard to imagine how we would add more than this.

we already have battle tested standards such as JOSE and COSE that cover the same area these codec are covering

These standards may seem small to you because you’re already using them, but for people who haven’t already fully adopted these standards they are quite large and contain a lot of opinions and other decisions that don’t make a lot of sense to other workflows.

I think those codecs will still be popular even while these ones occupy a similar space because those standards already have some adoption, but having spent time adding encryption to an IPLD application I can comfortably say that they are a lot more than is necessary and would be a barrier to adoption if they were the only way to do encryption in IPLD.

I also don’t think they are necessarily in conflict at all given the fact that these AES codecs don’t address signing whatsoever.

JonasKruckenberg commented 3 years ago

Yeah I agree, as I've said in my talk I also think that COSE is not a good fit for IPLD, for various reasons. The overhead of COSE (and JOSE too for that matter) are significant and that's one of the reasons. I've just seen a lot of people make a lot of poor security choices either because they didn't know better or because they had to cut corners somewhere. This is something that really worries me and that we should keep in mind that's all.

So anyway, I agree with you that low-level, small objects are a better fit for the composable nature of IPLD, +1 from me.

Maybe you can add a security guidelines section though, for example never reuse keys, only use secure algorithms etc.?

mikeal commented 3 years ago

Maybe you can add a security guidelines section though, for example never reuse keys, only use secure algorithms etc.?

We captured some of this in exploration reports but we really need a larger and more accessible document on encryption workflows that can cover this sort of thing.

mikeal commented 3 years ago

This is primarily in response to @aschmahmann but it’s a little broader than the scope of the thread it’s in so I’m doing a top level post about it.

In responding to another thread it became clear to me where I’m drawing the line between the codec identifier being a type identifier vs just a block format identifier.

Depending on your perspective the entire multicodec table is a type system. Those “types” are tied directly to block formats which then normalize to a Data Model representation. However, it is clearly true that the codec identifier is providing more than just a parser hint and there are numerous examples where we use the codec identifier to provide additional type information beyond the data model representation. We do this w/ bitcoin, eth, git, etc. Those codecs mean a little more than “this is the block format”, they also signal what application produced those blocks and that application will do additional typing on that block data than IPLD will do in just the Data Model representation.

I don’t think it should be our goal to avoid muticodecs being used for type identification systems. But I do think it should be our goal to avoid multicodecs being used as the primary type identification system.

In other words, multicodecs should be used somewhat liberally to describe type systems rather than describing all the types within a system.

If Adin wants to write a new type system on IPLD, he should ask for one new multicodec. That should correspond to a block format that describes his types and produces a data model representation of that information while also acting as a signal that this data will mean more when handed over to Adin’s type system. That block format may literally just be dag-cbor, I don’t think it’s worth producing formal rules about format re-use.

Given these rules, I think the following spec changes are warranted.

Ericson2314 commented 3 years ago

Here's the way I think of it:

The reasons are of course that is better to keep one-off parsing/validation logic out of IPFS and everything else using IPLD. However, if we do a reductio ad absurdum on that principle alone, we end up with there should be no multicodecs (or rather, just 1), and we always get the raw bytes out. Clearly that is too extreme.

How can we fix this? I think with the following principle:

IPFS about the graph structure, nothing more, nothing less. Raw bytes per the above give us no child links, and thus no graph structure. This is ugly, and in particular it rules out graphsync, GC and pinning, and all the things that make IPFS a step above BitTorrent and other similar antecedents.

Combine these two, and we get that multicodecs should expose just enough structure to allow recovering all child links, but no more, and I think that is a good tight constraint on the design space.

mikeal commented 3 years ago

Big spec update to bring it inline with my last comment. Collapsed into a single block format that describes the cipher and iv length in the block format.

rvagg commented 3 years ago

Some questions I have for crypto-heads:

As per https://github.com/multiformats/multicodec/pull/202#issuecomment-766488711 I've also proposed that we add keylength to the AES cipher entries in the multicodec table, so you'd choose aes-256-gcm for example.

rvagg commented 3 years ago

OK, I had a brief discussion with @nikkolasg about this and did some more thinking and researching and here's my current position:

Mostly though, I think the format is fine for now, we can add a new multicodec for an extended-encryption if we need it later.

warpfork commented 3 years ago

What happened here? Are these spec changes we should merge as specs, or are they things we should keep in exploration report territory until further ratified and have more implementations? Who's working on it?

I'd love to land some of the data here, whether it's as fully-finished-and-ratified specs, or architecture design records, or exploration reports, I don't really care, I just want to get some more stuff out of the "open PRs" lane :)

rvagg commented 3 years ago

Stalled @ https://github.com/multiformats/js-multiformats/pull/59 but super close. I know Textile are interested in trying to use this so we could push it over the line. Project proposal @ https://github.com/protocol/web3-dev-team/pull/49 to get it wrapped up and my estimation is that it's fairly low investment to do.

ghost commented 3 years ago

Now you can use 2^56 + 32 like you wanted to. Still, transactions will not work since you made it silly in the beginning.

32 is for time of course.

If you want to hack dogecoin just do xor and mod. Do it after 2^56 + 32. It's literary free money and it's free :-)

Greetings from Ōnō.

PS: I invite you to create Quantum with me. The true currency. One for every human being alive. Infinite transactions and the value is static, both for sale and bid. Mine is worth infinite for sale and 0 for buy. I get one (just a constant in the source) Quantum and I never sell it. I literary can not.

We'll use 2^256 and 2^512 + 2^256. It's CHEAP xD

ghost commented 3 years ago

Big spec update to bring it inline with my last comment. Collapsed into a single block format that describes the cipher and iv length in the block format.

The what?