libp2p / specs

Technical specifications for the libp2p networking stack
https://libp2p.io
1.56k stars 273 forks source link

RFC: Signed Address Records #217

Closed yusefnapora closed 3 years ago

yusefnapora commented 4 years ago

Hey all, this introduces an RFC for a self-certified address record that could be published to a DHT or gossiped about in a pubsub topic without worrying about it being altered in-flight.

This is motivated by the discussion in https://github.com/libp2p/libp2p/issues/47 and https://github.com/libp2p/go-libp2p/issues/436.

I wrote up the record as an IPLD schema, but we could go with another format if we'd rather not have a dependency on IPLD.

I also wasn't sure whether we might want to issue records about peers other than ourselves - this draft does not, but we could have records with distinct subject and issuer peers.

There are some TODOs at the bottom that are better tracked here:

@bigs @raulk @mgoelzer @Stebalien @vyzo

ghost commented 4 years ago

cc @whyrusleeping

yusefnapora commented 4 years ago

What do people think about IPLD? I mostly just thought the schema notation is pretty cool, but I think if we want to use it for real we'd need a block store (at least an in-memory one), which doesn't feel like it's really a libp2p-layer concern. Maybe I'm overthinking that though.

Other options that I can think of are protobuf, which plays nice with most other libp2p things, or CBOR. Raw CBOR is as annoying to work with as raw JSON, but it does have a canonical encoding that is deterministic, which protobuf lacks. I'm not sure that a deterministic encoding is strictly necessary, and as far as I can tell, IPLD doesn't enforce one either. Anyway, protobuf is likely the simplest to use, so that seems like a decent way to go.

yusefnapora commented 4 years ago

I should note that I haven't tried actually using IPLD yet :) I'm going to play with it a bit and see how it works in practice. If anyone has experience with it, how is it?

yusefnapora commented 4 years ago

I've been thinking about this some more and just wanted to write up where my head is at before I start revising.

Ultimately I think IPLD is overkill for this, and we should just use protobufs. Earlier I was leaning away from this because there's no guarantees of deterministic encoding for protobufs, which was an issue with the peer id spec. However, as long as we sign, transmit and store serialized protobufs, we can still validate the record before deserializing it, and we don't really have to worry about whether our encoder has the same behavior as the one that produced the record.

I think I want to tease this apart into two RFCs:

  1. An all-purpose "signed envelope" record containing a peer id, byte string payload, and a signature of the payload. May optionally include public key (which must be consistent with peer id). Will require a "signature service" that can sign and validate.
  2. An address record with routability & confidence scores. This will get serialized to a byte string, and the encoded record will be wrapped in a signed envelope. The peer store will accept these directly, or inside a signed envelope. If inside an envelope, peer store will reject if signature is invalid. If signature is valid, mark the record as certified & surface that via peer store APIs.

This is basically the same idea as the original draft, except instead of an AddressEnvelope struct that stores a specific structured data type, the envelope only stores opaque binary blobs.

e.g.:

message SignedEnvelope {
  bytes peerId = 1;
  bytes contents = 2;
  bytes signature = 3;

  optional PublicKey pubkey = 4;
}

message AddressInfo {
  bytes addr = 1;
  Routability routability = 2;
  Confidence confidence = 3;
}

message AddressState {
  bytes subject = 1;
  repeated AddressInfo addresses = 2;
  // etc...
}

Storing serialized protobufs as a byte string inside another protobuf feels weird, but it seems necessary, given the lack of a deterministic encoder.

Does this make sense?

Stebalien commented 4 years ago
  1. Instead of requiring the peer ID and optionally having a key, how about we just always have the key and then always derive the peer ID? That way it's impossible to just assume that the key matches the peer ID?

  2. Yeah, a byte string within a protobuf seems like the right tactic if we're using protobufs.

Also note: we need to tag the data with some kind of "purpose" string so we can't reuse these records as messages in other protocols with signed messages. We do this in pubsub by prepending libp2p-pubsub: before signing/verifying the signature (but not when serializing) but I wonder if we should have a better solution. Maybe prepend a length-prefixed string? Null terminated string?

yusefnapora commented 4 years ago

how about we just always have the key and then always derive the peer ID

That is better 👍

Good question about the "domain separation" string. Maybe we could have a fixed string, e.g. "libp2p-signed-message:" but also allow the user to add an arbitrary "purpose" string that gets appended to the fixed string. This could be included in plain text as a field in the SignedEnvelope message, which might be a nice way to let users indicate how to process the contents of the message.

Something like

{
  publicKey: { 
    //...
  },
  contents: Buffer(),
  purpose: "AddressState",
  signature: Buffer()
}

And you prepend "libp2p-signed-message:" ++ purpose to the contents to get the signature ("libp2p-signed-message:AddressState" in the example).

Stebalien commented 4 years ago

This could be included in plain text as a field in the SignedEnvelope message, which might be a nice way to let users indicate how to process the contents of the message.

That will take extra space in the message itself which may discourage high entropy names.

I actually like forcing users to explicitly specify the purpose/domain. If we include it in the message, users can be lazy and ignore it. If we don't, users will have to call some function like:

func Open(domain string, envelope []byte) (message []byte, author PublicKey, err error) { ... }

And you prepend "libp2p-signed-message:"

I'm not sure if libp2p-signed-message is all that important as long as users specify purpose strings. However, I guess it can't hurt.

Maybe prepend a length-prefixed string? Null terminated string?

BTW, my concern here is that foo is a prefix of foobar which would allow an attacker to add bar to the signed data and pass it off as a foo message. This isn't an issue if we require this prefix to end in : but it may be better to just length-prefix it.

yusefnapora commented 4 years ago

Ah, I see where you're coming from with the out-of-band domain string, that makes sense.

Using only user-controlled data for the domain string makes me nervous, although I'm not sure I can really come up with a plausible attack. Still, there's not really any cost to using the fixed string plus the user-provided one, so might be worth doing just in case.

ghost commented 4 years ago

From my experience with security auditors, letting arbitrary or user-defined input control the interpretation of a message is going to raise a yellow flag for them even if it's not exploitable. (A plain text field in the protobuf does not seem as likely to raise those concerns.) Perhaps something to consider since we've fixed a lot of other stuff that was not technically broken (but is a nonstandard approach) to satisfy auditor concerns.

Stebalien commented 4 years ago

This is less of an arbitrary user-defined input than the bytes being signed as the purpose string is fixed at compile time.

My only concerns are:

  1. Ideally, we should always combine the purpose with the signed data the same way.
  2. We need to structure the signed data such that we can split $purpose$signedData into the $purpose and $$ignedData without external information. Otherwise, as noted above, messages for protocol foobar could be passed to protocol foo.

purpose:[data] makes (1) easy as we already do this in pubsub, TLS, and QUIC (once we cut a release). However, that does mean that : can't appear in the purpose.

yusefnapora commented 4 years ago

Good point - the "user" that controls the domain string is another libp2p subsystem, and there's no external input at runtime.

I guess I lean towards using a : separator and prohibiting : in the purpose string, since it's consistent with what we're doing elsewhere.

raulk commented 4 years ago

@yusefnapora aims to have a final draft of incorporating comments by today, in order to be reviewed and merged on Monday, to have a minimal implementation ready by the end of the week in Go.

Stebalien commented 4 years ago

Another extension I'd like to see is a "service bloom filter" that allows peers to advertise their protocols compactly and efficiently (without enumerating them, therefore with a degree of privacy preservation),

Hm, not really. If these bloom filters are at all useful, peers will be able to ask "does peer X likely speak protocol Y". I'd expect an attacker to ask "which peers speak protocol Y" (e.g., trying to find all the IPFS, Filecoin, etc. peers).

However, shrinking the size of the protocol list is likely worth it (but multicodecs!).

Stebalien commented 4 years ago

We should think about "record extensions", where confidence and routability would be two examples of future extensions.

YES!

Stebalien commented 4 years ago

Hoist the sequence number from the peer routing record to the envelope.

Why? Are we planning on duplicating these? I thought the idea was to make the envelope general-purpose.

Add a type field: not sure if this should be a string, an enumeration, a multicodec.

Multicodec prefixed byte string? That allows:

This is also consistent with ENS records.

yusefnapora commented 4 years ago

@Stebalien about moving the sequence number to the envelope, we were thinking it might be broadly useful for other kinds of payload, but as I was writing it up, I realized that each system that uses envelopes would need to have their own rules about how to interpret the sequence number (e.g. whether to invalidate & replace old records, merge them, etc). So it may be best to just leave it in the routing record and keep the envelope as simple as we can make it.

For the type field, I think your suggestion makes sense and gives us a lot of flexibility. 👍

Since we're talking about prefixing multiple fields to the content to sign instead of just the domain string, I'm now thinking that length-prefixing the fields is better than using a : separator. Especially if we want the type field to contain a CID, since a valid CID might randomly contain the UTF-8 code point for :.

yusefnapora commented 4 years ago

Just wanted to update this, since I've been a bit quiet this week. I've been working on an initial implementation in Go that I'm hoping to make PRs for today.

I diverged from what's written up here in one place. It seemed a bit silly to use varints for length-prefixing when building up the buffer to sign, since we don't need to save space on the wire or anything. So I just used uint64, which seems like way more than enough to fit any payload we might want to use envelopes for. I'll update this RFC to reflect that in a second.

Also, how does everyone feel about the name "routing records"? In the implementation I ended up calling them RoutingStateRecords to emphasize that they contain transient state, but I don't know if that's really any better. I'm also wondering if an even more generic name like "peer records" makes sense, since we're thinking about putting things like the protocol bloom filter in here, which isn't strictly a routing concern. Maybe "availability records"? IDK 🚲 🏠

Anyway, my plan for today is to fix up a few things in the go implementation and push those branches. Then I was thinking I'd merge this RFC in and start new PRs to turn the RFCs into Working Draft specs and move the conversation over to those PRs.

jacobheun commented 3 years ago

Now that this has been implemented in Go and JS I intend to merge this later today unless there is a reasonable objection. Ideally we would couple the approved RFC in a merge with its spec per https://github.com/libp2p/specs/issues/198, but I'd prefer this PR not sit open, completed, until someone can dedicate time to completing it.

aarshkshah1992 commented 3 years ago

@jacobheun Should we go ahead and merge this ?