penumbra-zone / penumbra

Penumbra is a fully private proof-of-stake network and decentralized exchange for the Cosmos ecosystem.
https://penumbra.zone
Apache License 2.0
376 stars 294 forks source link

Transport security and client authentication for View and Custody services #1556

Closed redshiftzero closed 8 months ago

redshiftzero commented 1 year ago

Currently, we don't have a meaningful transport security or client authentication story for the view or custody services. This was fine when we were using the view service in-process, or as a local daemon on the same machine for testing, but is not fine as we get closer to the end goal of having the view and custody services act as a "personal RPC".

There are two aspects to this problem, transport security and client authentication:

In the current iteration of the view protocol, we do not attempt to solve transport security at all, and grant access based just on having the client supply the AccountId (a hash of the full viewing key). This isn't a good solution on either front.

Transport Security

Since we're using GRPC, we should solve transport security using TLS. The only significant design question here is how to authenticate the view server. There are two options here.

Public certificates: one option is to use a DNS name to obtain a publicly verifiable certificate issued by a certificate authority.

Pinned certificates: another option is to have the view server create its own certificate, and for the client to pin it. Probably the most convenient way to handle this would be similar to the way that Tendermint Node IDs work: we'd hash the view server's certificate to VIEW_SERVER_ID, specify the address the client connects to as VIEW_SERVER_ID@123.234.12.34, and have the client validate the server's certificate against the pinned VIEW_SERVER_ID.

In either case, we would require the use of TLS to connect to the view service, and remove the current capability for non-encrypted connections.

Public certificates have the advantage that they can be used from web contexts, where transport security is handled and controlled by the browser. Pinned certificates have the advantage that they don't require a public DNS name, and can be used on local networks.

On the client side, public certificates can be validated in a standard way, while pinned certificates would require hooking into the certificate validation logic of the TLS stack. On the server side, public certificates require management of the entire certificate lifecycle, while pinned certificates just require generating a one-off certificate.

Client Authentication (View Service)

A basic question is whether authentication should be done at the transport level or at the request level. We use TLS certificates to authenticate the view server to the view client, so one possibility would be to use TLS client certs to authenticate the view client at the transport level, and create a bidirectionally authenticated channel. However, this requires that we have control over the transport layer. Since we care about web contexts, where we don't get to control the transport layer, and we only want to have one authentication mechanism, we need to have authentication at the request level.

One idea we'd floated in the past was to try to have a client authenticate to the view service by demonstrating knowledge of some part of the Penumbra key hierarchy (e.g., the FVK, or the AccountId, or whatever). This isn't a good idea, though, because it means that access to the view service can never be revoked, and means that the client has to have that long-term key material. Instead, we'd like it to be possible to use the view service without having any of the long-term viewing keys.

Another design aspect is that, even though the current implementation of the view server only views a single account, the protocol should allow one view server to support multiple accounts. To do this, the requests need to somehow be able to identify the account they're requesting data about. Currently, this is done by including the AccountId (a hash of the FVK) in each request, but this is not a workable auth mechanism for the reason mentioned above.

I'd propose a bearer-token system with the following shape:

package penumbra.view.v1alpha1;

message ViewAuthToken {
  bytes inner = 1;
}

message ViewAuthRequest {
 core.crypto.v1alpha1.FullViewingKey fvk = 1;
}

message ViewAuthResponse {
 ViewAuthToken token = 1;
}

service ViewAuthService {
  rpc ViewAuth(ViewAuthRequest) returns (ViewAuthResponse);
}

// ... in messages for the main view protocol ...

message NotesRequest {
  // Authorizes the request.
  ViewAuthToken token = 1;
  // If set, return spent notes as well as unspent notes.
  bool include_spent = 2;
  // ...
}

The high-level design points here are:

As a basic implementation for pviewd, we could handle ViewAuthRequest by generating 32 random bytes as a token, saving the token to the database, and then checking incoming tokens against the authorized value.

Client Authentication (Custody Service)

The custody protocol allows users to request authorization of a TransactionPlan, and allows the custody service to inspect the complete TransactionPlan to decide whether or not to return the authorization data.

In this context, then, client authentication is about the capability to request authorization, not the capability to perform authorization (which always remains inside the custody service).

There are some cases where this capability is meaningful, and other cases where it isn't. For instance, a custody service provided by a server that operates a hot wallet for trading and applies policy to automatically determine whether a transaction should be approved would benefit from only allowing select entities to make requests. But a custody service backed by a hardware wallet or other mechanism where a human approves the transaction wouldn't.

To handle this, I'd propose augmenting the AuthorizationRequest message to include two additional fields: request_key, holding a 32-byte Ed25519 public key, and request_sig, holding a 64-byte Ed25519 signature over the supplied TransactionPlan. Because these fields are optional, they can be missing or ignored in cases where request authorization is not relevant, but they give a standardized way to include authorization where it is relevant. Management of the request keys themselves would be left to the specific implementation of the custody service. Using Ed25519 means we have broader tooling compatibility (since we're just doing plain signatures on binary data, we don't have any special constraints on the construction), and only supporting one good signature scheme makes the ecosystem simpler.

hdevalence commented 1 year ago

(Replaced the stub issue with a design writeup)

zbuc commented 1 year ago

Re: pinned vs public certificates, both use cases seem useful to support. Maybe we start with the pinned use-case, since it requires more customization of the TLS handling and can be used for both public and private access.

Pinned certificates seem useful to support for cases where no public DNS record exists (if there is a public DNS record we could use RFC2136 to provide the ACME challenges for the subdomain pointing to the internal host, but this is additional complexity -- requiring a local DNS server, TSIG configuration w/ certbot, and upstream DNS configuration).

The difficulty with pinned certificates is rotation; in the case where they're both running on the same host it's easy, but in the (unlikely) scenario we were to operate a view server at https://testnet.penumbra.zone/ how would we publish changes to the pinned certificate ~if~when it became necessary? Maybe a CRL on a domain with a public certificate? Maybe the nature of the view service makes public access less desirable?

Due to the comparative operational simplicity of publicly accessible certs, maybe they should be the default whenever available, and pinned certificates can fill in the gaps for private networks without public DNS.

zbuc commented 1 year ago

Since we care about web contexts, where we don't get to control the transport layer, and we only want to have one authentication mechanism, we need to have authentication at the request level.

Theoretically most browsers do support mTLS but yeah, seems like a rich source of misconfiguration-related frustration if we were to go that route. I'm also not sure if it is easy to configure client certs in mobile browsers.

zbuc commented 1 year ago

The ViewAuthRequest demonstrates viewing authority by sending the full viewing key. This also means that the ViewAuthRequest works as a registration mechanism that tells the view server about viewing keys it should scan.

Is there phishing potential here, where someone could be convinced to configure their pcli to point to a view server owned by an attacker, who would then be able to view the user's FVK? I'm thinking of a scenario where someone maliciously responds to support requests, or creates one of those "Getting Started With Penumbra" blog posts or similar and a new user doesn't understand the implications of the command line arguments they're copy-pasting.

It seems like we'd want to use the FVK to sign a value, or at least salt + hash it. This has a couple open questions: how does the view server initially register the user's FVK? How do we prevent forwarding attacks where the signed value/hash is sent from the malicious view server to the legitimate one?

I think a challenge/response protocol or something like the OAuth spec's state parameter may be useful here.

zbuc commented 1 year ago

Because these fields are optional, they can be missing or ignored in cases where request authorization is not relevant, but they give a standardized way to include authorization where it is relevant. Management of the request keys themselves would be left to the specific implementation of the custody service.

To make sure I understand fully: this would be accomplished by having a configuration in the custody service to enable/disable the requirement for request authorization fields?

hdevalence commented 1 year ago

Re: pinned vs public certificates, both use cases seem useful to support. Maybe we start with the pinned use-case, since it requires more customization of the TLS handling and can be used for both public and private access.

On the other hand, if public certs require less integration with the TLS stack, it might be easier to support those first?

Pinned certificates seem useful to support for cases where no public DNS record exists (if there is a public DNS record we could use RFC2136 to provide the ACME challenges for the subdomain pointing to the internal host, but this is additional complexity -- requiring a local DNS server, TSIG configuration w/ certbot, and upstream DNS configuration).

The difficulty with pinned certificates is rotation; in the case where they're both running on the same host it's easy, but in the (unlikely) scenario we were to operate a view server at https://testnet.penumbra.zone/ how would we publish changes to the pinned certificate ~if~when it became necessary? Maybe a CRL on a domain with a public certificate? Maybe the nature of the view service makes public access less desirable?

I'm not sure that rotation is super important here; my thought is that it's similar to the situation with Tendermint's P2P keys, where they're pinned once and stay active for the lifetime of the deployment. My assumption is that pinned certs would only be used without DNS records, in which case we're more likely to be in a small-scale, private deployment where cert rotation is less important.

Due to the comparative operational simplicity of publicly accessible certs, maybe they should be the default whenever available, and pinned certificates can fill in the gaps for private networks without public DNS.

Yeah, I think that this should be the default, and I think we could even plan to implement public certs first and leave the pinned certs for later.

hdevalence commented 1 year ago

The ViewAuthRequest demonstrates viewing authority by sending the full viewing key. This also means that the ViewAuthRequest works as a registration mechanism that tells the view server about viewing keys it should scan.

Is there phishing potential here, where someone could be convinced to configure their pcli to point to a view server owned by an attacker, who would then be able to view the user's FVK? I'm thinking of a scenario where someone maliciously responds to support requests, or creates one of those "Getting Started With Penumbra" blog posts or similar and a new user doesn't understand the implications of the command line arguments they're copy-pasting.

There's phishing potential, but I think the phishing potential is inherent to the problem the view service solves. If we design a different credential-request mechanism, we still have to have a registration mechanism, and we're back in the same place, where someone can be convinced to register with a view service someone else runs. So I don't think that having an alternate token request mechanism really helps with phishing risk.

It seems like we'd want to use the FVK to sign a value, or at least salt + hash it. This has a couple open questions: how does the view server initially register the user's FVK? How do we prevent forwarding attacks where the signed value/hash is sent from the malicious view server to the legitimate one?

Hmm, I think the idea would be that the identity of the view server is assured via TLS, so it reduces to whether or not the view server is actually trusted, which is the inherent phishing problem above.

zbuc commented 1 year ago

Hmm, I think the idea would be that the identity of the view server is assured via TLS, so it reduces to whether or not the view server is actually trusted, which is the inherent phishing problem above.

Since designing ourselves out of this via protocol design seems difficult, maybe we can address this in the client by adding a prompt on first registration explaining that the view server will have access to their viewing key and the user should confirm they trust the view server?

hdevalence commented 1 year ago

After discussion in the design meeting, we decided:

  1. We'll support only two kinds of endpoints: public DNS names with fully auto-managed certificates, and pinned certificates similar to Tendermint P2P connections (FINGERPRINT@IP_ADDR:PORT).
  2. For all "personal" services (i.e., everything except pd), we always require TLS, even over internal networks. This way, if someone misconfigures their VPN or something, they don't leak data. For pd, we'd make an exception to ease load-balanced public deployments that terminate TLS externally.
  3. We'll change the view protocol to add an opaque auth token. A view service implementation is not required to use the suggested auth mechanism that combines registration with authorization.
  4. We'll change the custody protocol to add Ed25519 signing. A custody service implementation is not required to use it.
hdevalence commented 1 year ago

GRPC already has a bearer token mechanism, we should use it instead of stuffing extra data in request messages: https://github.com/hyperium/tonic/blob/master/examples/src/authentication/server.rs

hdevalence commented 8 months ago

The original design turned out to be a bad one. We should not plan to do this, until we have a concrete use case. For now the story is "don't expose to the internet".