codec: AES encrypted blocks

mikeal commented 3 years ago

Here’s an initial spec for the AES codec work I’ve done https://github.com/multiformats/js-multiformats/pull/59/files

mikeal commented 3 years ago

There’s a discussion that started in the js-multiformats implementation that we should move to this spec https://github.com/multiformats/js-multiformats/pull/59#issuecomment-759615163

Should we drop the CID length and just parse the CID out of the blocking by parsing through the varints? It would require some additional parsing rules and complicate things but it would also shave 4 bytes off of every block.

rvagg commented 3 years ago

Created https://github.com/multiformats/js-multiformats/pull/60 to show what it could be like without the length.

mikeal commented 3 years ago

There’s actually only byes and a map that come out of the codec. The length is never surfaced, that’s just part of the block format.

rvagg commented 3 years ago

This codec only "supports" Bytes, nothing else, in the same way that dag-pb only supports Bytes and Links. The List, Map, etc. are only artifacts of how it decodes into the Data Model, they can't be used to encode any other forms.

vmx commented 3 years ago

Thanks @rvagg, I know get the point that this codec cannot encode any arbitrary Data Model Map. The use of Schemas here confused me at first, but pointing to DAG-PB made me realize that we do the exact same thing there too.

JonasKruckenberg commented 3 years ago

I wanted to give some feedback on this too, as it parallels a lot of the work we've been doing with dag-cose (and things that dag-jose already addresses kinda)

What I like about this proposal

I really like how simple this is, it's a nice low level primitive to built more complex structures upon. I also like that this does not add a lot of overhead (both processing and storage wise) to the block. Something that worries me with dag-cose.

What I don't like about this proposal

My main concern is, that this proposal is way too easy to misuse by developers who don't know better. This codec offloads all of the actually security relevant decisions to the application, while I get that there are good reasons for processing the block at the application this also pushes ALL the responsibility to userland. So in short what does this codec offer that I can't already archive with and identity codec?

My seconds concern is that this is basically reinventing the wheel, we already have battle tested standards such as JOSE and COSE that cover the same area these codec are covering.

That said, please don't feel offended, this proposal is definitely a step in the right direction, I just think this isn't our holy grail quite yet. I think we're on to something with this proposal though!

mikeal commented 3 years ago

this codec offloads all of the actually security relevant decisions to the application

I don’t quite understand the concern here. What IPLD currently offers for encryption is nothing. Everyone doing encryption is doing it in the application layer above IPLD. What this spec does is offer very small primitives to help those projects along without changing the layer model of IPLD or forcing a particular encryption workflow on IPLD (which just wouldn’t work).

Just looking at where IPLD lives in the stack, it’s hard to imagine how we would add more than this.

we already have battle tested standards such as JOSE and COSE that cover the same area these codec are covering

These standards may seem small to you because you’re already using them, but for people who haven’t already fully adopted these standards they are quite large and contain a lot of opinions and other decisions that don’t make a lot of sense to other workflows.

I think those codecs will still be popular even while these ones occupy a similar space because those standards already have some adoption, but having spent time adding encryption to an IPLD application I can comfortably say that they are a lot more than is necessary and would be a barrier to adoption if they were the only way to do encryption in IPLD.

I also don’t think they are necessarily in conflict at all given the fact that these AES codecs don’t address signing whatsoever.

JonasKruckenberg commented 3 years ago

Yeah I agree, as I've said in my talk I also think that COSE is not a good fit for IPLD, for various reasons. The overhead of COSE (and JOSE too for that matter) are significant and that's one of the reasons. I've just seen a lot of people make a lot of poor security choices either because they didn't know better or because they had to cut corners somewhere. This is something that really worries me and that we should keep in mind that's all.

So anyway, I agree with you that low-level, small objects are a better fit for the composable nature of IPLD, +1 from me.

Maybe you can add a security guidelines section though, for example never reuse keys, only use secure algorithms etc.?

mikeal commented 3 years ago

Maybe you can add a security guidelines section though, for example never reuse keys, only use secure algorithms etc.?

We captured some of this in exploration reports but we really need a larger and more accessible document on encryption workflows that can cover this sort of thing.

mikeal commented 3 years ago

This is primarily in response to @aschmahmann but it’s a little broader than the scope of the thread it’s in so I’m doing a top level post about it.

In responding to another thread it became clear to me where I’m drawing the line between the codec identifier being a type identifier vs just a block format identifier.

Depending on your perspective the entire multicodec table is a type system. Those “types” are tied directly to block formats which then normalize to a Data Model representation. However, it is clearly true that the codec identifier is providing more than just a parser hint and there are numerous examples where we use the codec identifier to provide additional type information beyond the data model representation. We do this w/ bitcoin, eth, git, etc. Those codecs mean a little more than “this is the block format”, they also signal what application produced those blocks and that application will do additional typing on that block data than IPLD will do in just the Data Model representation.

I don’t think it should be our goal to avoid muticodecs being used for type identification systems. But I do think it should be our goal to avoid multicodecs being used as the primary type identification system.

In other words, multicodecs should be used somewhat liberally to describe type systems rather than describing all the types within a system.

If Adin wants to write a new type system on IPLD, he should ask for one new multicodec. That should correspond to a block format that describes his types and produces a data model representation of that information while also acting as a signal that this data will mean more when handed over to Adin’s type system. That block format may literally just be dag-cbor, I don’t think it’s worth producing formal rules about format re-use.

Given these rules, I think the following spec changes are warranted.

A unified “Encrypted Block” specification for encoding blocks wrapped in encryption that include an initializing vector.
This new “Encrypted Block” format would contain within it a multicodec identifier for aes-gtm. AES flavors are common enough that that should be in the multicodec table anyway and these identifiers can be re-used elsewhere in other cryptography systems for identifying the cipher. The spec would no longer be defining the meaning of the aes multicodecs if/when they are used as the codec in a CID. In other words, they won’t point specifically at this block format.

Ericson2314 commented 3 years ago

Here's the way I think of it:

If it's possible to have an "untyped" representation that uses fewer multicodecs, this is preferred.

The reasons are of course that is better to keep one-off parsing/validation logic out of IPFS and everything else using IPLD. However, if we do a reductio ad absurdum on that principle alone, we end up with there should be no multicodecs (or rather, just 1), and we always get the raw bytes out. Clearly that is too extreme.

How can we fix this? I think with the following principle:

The multicodec should include enough information to recover all child links.

IPFS about the graph structure, nothing more, nothing less. Raw bytes per the above give us no child links, and thus no graph structure. This is ugly, and in particular it rules out graphsync, GC and pinning, and all the things that make IPFS a step above BitTorrent and other similar antecedents.

Combine these two, and we get that multicodecs should expose just enough structure to allow recovering all child links, but no more, and I think that is a good tight constraint on the design space.

mikeal commented 3 years ago

Big spec update to bring it inline with my last comment. Collapsed into a single block format that describes the cipher and iv length in the block format.

rvagg commented 3 years ago

Some questions I have for crypto-heads:

Should we exclude CBC from this proposal entirely - I think I could imagine an IPLD-based communication system where you could construct a padding oracle attack, but maybe that’s not reasonable
Adding in CCM is nice because it shows the initial extensibility of the system, but would anyone have reason to choose CCM and therefore should we just exclude it initially to encourage people to use GCM?
ChaCha20 seems like a good addition for an initial suite, is that reasonable?
Do we have any reason to consider authentication tags, AAD, or even AEAD in this situation? We’re not trying to build TLS with this, but some of it feels like it’s coming close to needing additional features, especially depending on how someone might deploy such a system and what they use it for.

As per https://github.com/multiformats/multicodec/pull/202#issuecomment-766488711 I've also proposed that we add keylength to the AES cipher entries in the multicodec table, so you'd choose aes-256-gcm for example.

rvagg commented 3 years ago

OK, I had a brief discussion with @nikkolasg about this and did some more thinking and researching and here's my current position:

Let's leave off the auth tag, additional data and AEAD concerns for now, this format should be enough to capture the basic case - we get authentication, to some degree, with it just being encapsulated in an IPLD block with a content address. But there's going to be use-cases where this is applied where things start to break down and we might want to add an additional format that adds some of these authentication features. This may also be a documentation problem for us - don't expect this to solve all your problems and work for all your use-cases, know the limits of what this offers!
Let's remove CBC, it may not be a problem in general for how we mostly expect this to work but since it's known to be flawed in the typical environments its used it's got a stain that follows it around and we can just step over by not even touching it.
CCM is .. ok .. I just don't think there's a point - if you're choosing AES then you're choosing GCM at the moment (maybe we'll be talking about AEX in a few years, but that's not yet).
Adding in ChaCha20 would be good, it's typically coupled with AES in cipher suites because it's faster where you don't have hardware implementations and is treated as a "we have this other entirely different thing in our suite in case AES is found to be insecure one day and we're not scrambling for a replacement through our whole deployed stack".
XChaCha20 is worth considering as an addition, or instead of ChaCha20. It does bigger nonces and is supposed to be more secure across larger numbers of messages with the same key for that reason.
We should dictate that implementations support both the AES and ChaCha20-based ciphers for the reason stated above.

Mostly though, I think the format is fine for now, we can add a new multicodec for an extended-encryption if we need it later.

warpfork commented 3 years ago

What happened here? Are these spec changes we should merge as specs, or are they things we should keep in exploration report territory until further ratified and have more implementations? Who's working on it?

I'd love to land some of the data here, whether it's as fully-finished-and-ratified specs, or architecture design records, or exploration reports, I don't really care, I just want to get some more stuff out of the "open PRs" lane :)

rvagg commented 3 years ago

Stalled @ https://github.com/multiformats/js-multiformats/pull/59 but super close. I know Textile are interested in trying to use this so we could push it over the line. Project proposal @ https://github.com/protocol/web3-dev-team/pull/49 to get it wrapped up and my estimation is that it's fairly low investment to do.

ghost commented 3 years ago

Now you can use 2^56 + 32 like you wanted to. Still, transactions will not work since you made it silly in the beginning.

32 is for time of course.

If you want to hack dogecoin just do xor and mod. Do it after 2^56 + 32. It's literary free money and it's free :-)

Greetings from Ōnō.

PS: I invite you to create Quantum with me. The true currency. One for every human being alive. Infinite transactions and the value is static, both for sale and bid. Mine is worth infinite for sale and 0 for buy. I get one (just a constant in the source) Quantum and I never sell it. I literary can not.

We'll use 2^256 and 2^512 + 2^256. It's CHEAP xD

ghost commented 3 years ago

Big spec update to bring it inline with my last comment. Collapsed into a single block format that describes the cipher and iv length in the block format.

The what?

ipld / specs

codec: AES encrypted blocks #349

What I like about this proposal

What I don't like about this proposal