ipfs / specs

Technical specifications for the IPFS protocol stack
https://specs.ipfs.tech
1.15k stars 232 forks source link

Linked-Data Key (Previously Multikey) #58

Open Kubuxu opened 8 years ago

Kubuxu commented 8 years ago

I didn't see any specs for multikey so here are my notes what I'd love to see in it and how it could look.

We need a way to represent key types but also how are those keys stored, for example password they might protected. As keys might be getting much bigger (QC and hash based signature crypto) we also need a way to express keys bigger than 256 bytes. There are three options for that:

(maybe something even bigger). Each of them hash its pros and cons.

As format goes I would see it as:

[key protection schema+key type][crypto type][size][protected key]

First byte in 3 lower bits would include information if it is public, private or secret key (3bits = 8 values, rest left for future) and in 5 higher bits it would include information about how the key was protected, for example: no protection, scrypt+AES256, scrypt+salsa20, pure AES and so on.

Next byte would point onto crypto schema of key itself, it would depend on key type.

In case of symmetric key it might be AES128, AES256, salsa20. In case of private and public for example RSA1024, RSA2048, ed25519, curve25519, ECDSA.

Questions:

Edit: Point of key protection schema is to allow sending for example password protected private keys. In case that just exchange of public keys took place, no-protection schema should be expected.

daviddias commented 8 years ago

Yes, thank you for pushing this 🙌 Currently, most of the context of multikey lives on https://github.com/jbenet/random-ideas/issues/31 and probably will live as one of the components of https://github.com/ipfs/specs/tree/master/keychain.

jbenet commented 8 years ago

I'm ever closer to specing this out-- need it for keychain yes.

A use case is motivating all of this:

Working my way up :) On Sat, Jan 2, 2016 at 16:49 David Dias notifications@github.com wrote:

Yes, thank you for pushing this 🙌 Currently, most of the context of multikey lives on jbenet/random-ideas#31 https://github.com/jbenet/random-ideas/issues/31 and probably will live as one of the components of https://github.com/ipfs/specs/tree/master/keychain.

— Reply to this email directly or view it on GitHub https://github.com/ipfs/specs/issues/58#issuecomment-168430716.

almereyda commented 8 years ago

I'm finally sure this is the right place to drop this here, despite I never seem to be able to find the fs:// discussion ever again.

When doing a broad and rough research into the ICN topic, I've stumbled into http://tools.ietf.org/html/draft-farrell-ni-00 via http://dirk-kutscher.info/publications/uris-for-named-information/. There don't seem to be any implementations of this, but multikey follows similar patterns, thus the reference.

jbenet commented 8 years ago

@almereyda nice find! just open another issue in this repo about it. if ni:// is widely deployed (i havent seen it) we could see about supporting it too. (i think you mean it looks like multihash -- https://github.com/jbenet/multihash)

Kubuxu commented 8 years ago

Since I written the issue a method of merging ed25519 and curve25519 keys got established this means we can use just one (probably ed25519) and transform the public key when we want to use encryption (curve25519).

What motivates me the most about that is possibility of switching off RSA for the communication. Using ed25519 would also allow for DHT record signing as it is much much quicker than RSA and signature is only 64 bytes.

Kubuxu commented 8 years ago

ianopolous in IRC highlighted that separating encrypting and signing keys is important, both for RSA and post quantum crypto systems.

Also about post QC systems, we should account for quite big key sizes of those.

ianopolous commented 8 years ago

Yep, multiple megabytes for keys is necessary in some of the PQC schemes. But so long as you can also include the multihash of a (public) key instead of the key itself you should be fine.

ianopolous commented 7 years ago

It would be great to progress this. A first step could be to agree on a format for public keys only. I'm hoping it will be ipld/cbor based to ease writing en/decoders.

For reference, the format we use in Peergos so far is a cbor list with two elements, the first is a cbor int which specifies the type (and needs an accompanying lookup table like multihash), and the second is the cbor byte[] of the key contents. This ends up as a single byte for the list and its length, a single byte for the type, and two bytes for the length of the byte array and the fact it is a byte[], then the bytes themselves. So a four byte overhead on a 32 byte key like Ed25519.

whyrusleeping commented 7 years ago

@JustinDrake might be interested in this too

daviddias commented 7 years ago

Getting back to this issue (thanks for the ping @whyrusleeping ;)). A couple of questions come to mind:

Other questions that arise:

ianopolous commented 7 years ago

Hi @diasdavid, great to see this progressing. :-) In what follows I'm only referring to public keys.

I think the main question to decide first is whether you want the format to be cbor. Using cbor means that you can put the object directly into ipfs and naturally reference and pin it by the resulting cid. We already do this in Peergos for both signing and boxing public keys which are merkle linked from a root object for each user.

If you agree that it should be cbor, then the initial int in my suggestion is exactly a multicodec (remember a cbor int is actually a varint), functionally equivalent to the lookup table in multihash and future proof. Then decoders specialise based on this int. The second part, which is the actual key bytes, doesn't need to be understood by a parser because cbor takes care of that by encoding the length with a varint.

The only thing to be careful of is that large keys can't be stored as a single flat array (I'm thinking multi-megabyte post quantum keys), because of the object size limit, so will need to be chunked and linked accordingly. So maybe the type prefix could also include whether the key material is a single raw flat array or a merkle link.

daviddias commented 6 years ago

Just had a brainstorm with @ianopolous on steps to move this forward.

Some notes from the discussion https://cryptpad.fr/code/#/1/edit/h57w6Cgcu72ZD8puM60teA/Tp-OpCZxIFk5KBr7nTJkCKMw/

daviddias commented 6 years ago

Alright, we have more :)

We went through more examples and we concluded that the format we are looking for is more a "Linked-Data Key" than a "Multi-Key" as Keys are not described by their links (remember, Keys are stored in IPLD graphs)

image

daviddias commented 6 years ago

One more!

Will we enable some kind of ALPN for things like secio? The case I'm thinking is when I get the hash of the RSA PubKey, but I want to dial to that node using ECC.

This can be simply resolved by having all the nodes understand how RSA/ECC works so that they know how to make the crypto challenges (SECIO) without needing to have the same type of keys.

This will require some fun on SECIO, Identify and libp2p-switch

Stebalien commented 6 years ago

Overall, a BIG :+1: from me.


I'd still call that "multikey" as the key describes its type (and should probably have a field that says that it's a key). But that's getting into semantics...

Keys should be full unixfs files so that they can be transferred using a regular exchange, getting IPFS ready to handle keys really large (GB or TB even).

I don't see why they have to be unixfs files for this. Exchanges exchange blocks (and, eventually, DAGs); they generally aren't unixfs specific. However, I've been pushing to have the next iteration of unixfs to be able to treat IPLD DAGs as files, if that helps (so we can store IPLD DAGs in unixfs without losing information about its structure).

no key with in CID

That proposal was for performance and we can still do that (although we should definitely stop using the DHT, regardless of what we do). That is, we can use <multibase><cidv0><dag-cbor><mh-identity><length><the embedded key>.

edit: cidv0 should be cidv1

ghost commented 6 years ago

That is, we can use .

Wouldn't that go against "no CIDv0 unless it addresses an actual dag-pb node"?

Stebalien commented 6 years ago

Sorry, I meant <cidv1>.

dignifiedquire commented 6 years ago

@diasdavid curious about why you and @ianopolous suggested the keys being unixfs files, this seems like unnecessary overhead to me.

daviddias commented 6 years ago

@dignifiedquire we need to account for keys that will be GB or even TB in size https://github.com/ipfs/specs/issues/58#issuecomment-230002269

dignifiedquire commented 6 years ago

That only means we need a concept of general sharding. Requiring to use unixfs objects seems the wrong approach to this for me, as it mixes two abstractions.

On 13. Nov 2017, 18:26 +0100, David Dias notifications@github.com, wrote:

@dignifiedquire we need to account for keys that will be GB or even TB in size #58 (comment) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

ianopolous commented 6 years ago

Does unixfs imply using protobufs? If so, then I would prefer a simpler cbor(ipld) based structure.

daviddias commented 6 years ago

It does not imply using protobufs, it only implies that an ipfs cat on the CID of the key should work.

There is a current endeavor of creating a new generation of unixfs using the new IPLD https://github.com/ipfs/ipld-unixfs


Edit(Kubuxu): fixed link to ipld-unixfs

msporny commented 5 years ago

Hey, I'm the current editor for the IETF Multihash spec: https://tools.ietf.org/html/draft-multiformats-multihash-00

I'm also updating Veres One to use multiformats for the cryptonym identifiers. Specifically, we do this for ed25519 keys:

// ed25519-pub 0xed01 + 32 pubkey-bytes
0xed01cccb336bf5e0f7b2fe0d7cfe0ccce7e2d9c59de5607a1bc1fce233a3b0caa11d
Example: did:v1:nym:z279wWXz4nugfh2XATAnFQkqaoSg97AWyNbsvdpr8hujamKJ

... and this for RSA keys (note the rsa-pub-fingerprint 0x5a value has not been requested yet):

// spki-der-fingerprint 0x5d + sha2-256 0x12 + 32 byte value 0x20
0x5a12209c82d16b3826b2616f11b23077a2949dcded03d774c90d7e241e071b57d9fea1
Example: did:v1:nym:z2czTJ1VEECSvESEamgp88mBLpqyJvyKvEE4YNamMoY1JWK29sKv

Is this the approach other folks are taking for cryptonyms (identifiers based on cryptographic material)?

Stebalien commented 5 years ago

Peer IDs are our current cryptographic identifiers. We currently just take this protobuf and then hash it with multihash.

We've also recently added a rule that all serialized keys shorter than 42 bytes should be hashed with the "identity" multihash so that it can be extracted from the key itself. This handles the ed25519 case and ensures that we always generate the same identifier from the same key.

We'd like to switch to CIDs/IPLD. In this world, the peer ID would just be a normal CID (e.g., cidv1-cbor-sha2-256-digest) and the key would likely be an IPLD object containing the key material and type (at a minimum).


Note: Unless the entire key is stored in the ID, there's little use in storing things like the rsa public key fingerprint in the ID. A sha2 hash should be sufficient.

pawal commented 5 years ago

@msporny In which wg is the draft discussed?

msporny commented 5 years ago

@pawal -- It hasn't been assigned to a WG yet. I want to get a few more revs of the spec done with possibly a test suite + 3 implementations passing the test suite before trying to push it into a WG.

msporny commented 5 years ago

@Stebalien said:

We'd like to switch to CIDs/IPLD. In this world, the peer ID would just be a normal CID (e.g., cidv1-cbor-sha2-256-digest) and the key would likely be an IPLD object containing the key material and type (at a minimum).

How are CIDs currently encoded? Here's what we are proposing:

https://github.com/w3c-dvcg/lds-ed25519-2018/issues/3

Does that look like it might be aligned with where CIDs want to go in IPFS? If not, why not?

Note: Unless the entire key is stored in the ID, there's little use in storing things like the rsa public key fingerprint in the ID. A sha2 hash should be sufficient.

The RSA SPKI public key fingerprint is a sha2-256/256 hash. There is an example of an RSA SPKI-based fingerprint used as a cryptographic CID:

https://github.com/w3c-dvcg/lds-ed25519-2018/issues/3

What do you think of that proposal? I'd like to get at least the Sovrin, Veres One, and IPFS communities aligned on cryptographic identifiers so I can propose a multibase+multicodec+multihash spec for cryptographic identifiers at IETF.

pawal commented 5 years ago

@msporny What wg do you have in mind?

A lot of things are going to change in the spec when going for standards track and going through a wg, so aiming for three implementations is not really necessary work, just more work for the implementations when things change.

msporny commented 5 years ago

What wg do you have in mind?

Perhaps SEC area and CFRG to start, get some input there. Then maybe move into ace. I expect that the area director will have an opinion on where this work goes... it's not clear that it fits neatly into an existing group.

A lot of things are going to change in the spec

hmm, I thought a good chunk of the multi* specs were pretty stable at this point. Are they not? Perhaps I'm misunderstanding where this work is at present. I thought there were multiple implementations of multibase, multicodec, and multihash already?

Or are you talking about the "how to encode CIDs" bit?

pawal commented 5 years ago

@msporny I meant that if you're aiming for standards track, getting the document through a wg will change a lot of details in the draft. Depending on the input of the wg.

ChristopherA commented 4 years ago

I'm needing to encode keys in base45 https://github.com/multiformats/multibase/issues/64 encoded master seeds (BIP45) for QR encoding, in particular 32 and 64 byte master seeds, shamir secret shards (40 bytes), derived private and public keys (32 bytes plus path), ecdh signatures (72 bytes), Schnorr signatures (32 bytes or 64 with public key), etc.

All of these for QR encoded air gap scenarios.

Any progress on multikey?

-- Christopher Allen

gobengo commented 1 year ago

getting the document through a wg will change a lot of details in the draft. Depending on the input of the wg. @pawal

now is a good time to shape the input of the wg. Here is a charter proposal.

There is also a new mailing list hosted by IETF to discuss multiformats. https://mailarchive.ietf.org/arch/browse/multiformats/