martinthomson commented 7 years ago

I'm having a lot of trouble with the current formulation of this draft. It's taken me a little while to pin this down to the point that I could write something constructive down, and I'm not even sure that I'm there yet. But I'm going to try anyway.

In part, the disparate views we've collected are coloured by the nature of the question, so it's worth stepping back and talking about goals.

A Reaffirmation of the End-to-End Principle

The introduction talks about the end-to-end principle, which I think is one particular target that this document might aim to say something about. The key realization I get from the statement of the end-to-end principle is at the level of software that does real stuff. The end-to-end principle says that if that software cares about correctly functioning, then it has to take responsibility for that functioning.

I saw this when working on high availability services in the cloud and the same was true when building boxes for network operators to deploy. Most of those were services or systems deployed in support of some larger application. Inevitably, our fault analysis revealed error conditions that we simply couldn't correct. Whether it be a hardware fault, a server crash, or packet loss, some errors can't be recovered without contextual knowledge of the goals of the application. Ultimately the responsibility for dealing with the error lies with the application as a whole.

A protocol is in many ways a middle-man in the sense we're talking about. A protocol doesn't do anything, it exists only in the service of other activities.

Of course, a protocol can do a great deal to support these end-to-end goals. For instance, An application that values reliability might benefit greatly by choosing TCP over UDP for its interactions. The extent to which a protocol is able to provide tools that are relevant to application needs determines the effectiveness of that protocol.

An example of how the tools break down is in the discussion of replay in HTTP when using TLS early data. We recently discovered that the web in no small way relies on clients retrying certain requests. This messes badly with the assumed view that certain requests were made at most once. See this draft for a description of this.

One virtue in the simplicity of the core Internet protocols is that they concentrate on providing simple, well-understood tools and leave many of the end-to-end concerns to higher layers. And to varying degrees, we've managed well enough.

You might like to compare Internet protocols to modern computing: an IP packet is cheap and disposable, just like the hardware in a data centre. In both cases, cheapness also means unreliability and so people build systems that take that volatility into account with redundancy and error handling.

In the data centre, that produced distributed databases and the crazy multi-tier system architectures we see all over (CAP Theorum reigns!). In IP unreliability produced the TCP protocol and ultimately TLS. It turns out that integrity and confidentiality are more important in some ways for robustness than they are for things like privacy; you can talk to people about why they believe that major web sites deploy HTTPS and it's not always clear that privacy is the foremost reason. An encrypted transport (like QUIC) seems like a pretty natural conclusion when you look at things this way.

There's a worthy statement to make in there, but it's very different to where I think the document is currently headed.

The Document as Written

The alternative viewpoint is in exploring how the idea of what an endpoint is shifts based on context. This is a narrower framing of the goal and one that is probably more achievable. Of course, it's less satisfying.

From that I think that the statements about layers is super-relevant, but also virtually information-free. Every two-party protocol has two peers; every communication has an intended recipient. Anyone who has spent time with wireless networking has likely seen a diagram like this:

Typical 3gpp Stack

The idea here being that the peer for a given protocol layer is not the same all the way up the stack. If you look at Diameter or SIP, these notions are built into the protocols themselves. That said, I doubt that there are very few people on the planet that can describe how the various parts of SIP target different peers. In Diameter, the notion is relatively clean, but I can't profess to understand it fully.

Distributed Endpoints

The points Ted makes about distribution of endpoints is important to capture; it's one of the key ways in which our perspective on this issue has evolved over time.

A protocol endpoint might manifest as a set of cooperating machines. I remember building server architectures that relied on forwarding of messages and exchanging state between multiple nodes, but the ultimate effect was that the cluster acted as a coherent single entity.

That abstraction does occasionally leak, and not always in a bad way. Even within the one protocol, individual protocol elements might target different entities.

In TLS, we realized that the server name extension is consumed by load balancers, even if they don't hold the keys that would allow them to authenticate as the server itself. This is now critical to the functioning of the protocol. You might regard that as ossification, and it certainly meets the criteria because it has screwed with our ability to deploy SNI encryption, but it's not necessarily a choice that we'd make differently even if that choice weren't taken from us already.
In QUIC, the connection ID is intentionally constructed in a way that makes consumption by a load balancer easier.

This is where we also consider the role of a proxy, gateways to other systems, and the like. That might be an HTTP proxy (bleargh), a WebRTC gateway to legacy SIP, a SIP-ISDN gateway, a CoAP-HTTP gateway, or the numerous other examples that don't immediately spring to mind. In all these cases, the endpoint at some (or all) protocol layers is naturally the gateway or proxy. But for the purposes of the real application, these are merely speed bumps.

Embracing the Leaky Abstraction

The notion that an endpoint might be distributed is something that isn't necessarily part of protocols we design, but it's a notion we can take advantage of in various ways. By building protocols that recognize the facts of deployment, we can better enable deployment strategies that achieve greater scale.

I've some experience here with HTTP. The web is an example of where distribution is was available from the beginning. The ability to include content from different origins in a page makes distribution quite visible. Recently, there have been multiple techniques designed to greater allow for this sort of distribution, this includes Alternative Services RFC7838 and the ORIGIN frame.

DNS would probably melt without caching intermediaries. You talk to a recursive resolver as though it were the entire DNS, but there's no attempt to hide the fact that the messages you send are from or for the authoritative servers.

Security Aspects: Subversion of Intent

An important part of this formulation is subversion of expectations. Joe's comment is critical here, and - I think - closer to the mark than Lee's comments about the distinction between different types of endpoints (internet, security, application) and how they might not be co-resident.

Absent technical mechanisms (read: crypto) it's possible that the peers that you have might not be the peers that you expect. In some cases, we might have become acclimatized to the idea of intermediation (see NAT), in others it is basically indispensible (see DNS, TOR), but each use has a cost we need to acknowledge.

Routing is now so complex I have no hope of understanding it. I'm sure that we could find someone who could provide a perspective on this that would help.
An endpoint that constructs an IP packet often ends up talking to a NAT. That has implications for use of protocols at the next layer, where TCP is likely to work, UDP barely, and others not at all. Of course, we're now so inured to these effects that it's now just another facet of how the Internet (barely) works.
TCP terminators interjecting an endpoint that doesn't match the original intent. I'm told that TCP termination at an intermediary is great for throughput if you use Reno or Cubic, but that a congestion controller like BBR works much better when the middlebox is out of the way. Of course, the TCP terminator was already deployed and it's not getting removed. Now deploying BBR doesn't give you much of an advantage, maybe it's even a net loss. (Now we're back to the well-trodden ossification topic.)
TLS MitM requires a higher degree of control over endpoint. MitM here has a more significant impact because the expectations are that much stronger.

See earlier points about the deployment of security mechanisms in defense of intent.

Trimming Scope

There are a few aspects of the captured inputs that I think could be trimmed out.

identity - I don't think that Eliot's suggestion about endpoint being coupled to identity is core to the thesis. It's definitely true that we identify an endpoint in interactions, but these identities are subjective and situational in the same way that endpoints are. That endpoints are identified is almost a truism. A more concrete use of identity might be relevant when we talk about how security mechanisms are applied to avoid subversion, as above.

state - Like with identity, I think that this merely a truism. Protocols are stateful.

hardie commented 7 years ago

It has taken me a good amount of time to process Martin's comment, and I'm not sure I am there yet. But a couple of comments in-line, mostly about goals.

A Reaffirmation of the End-to-End Principle

The introduction talks about the end-to-end principle, which I think is one particular target that this document might aim to say something about. The key realization I get from the statement of the end-to-end principle is at the level of software that does real stuff. The end-to-end principle says that if that software cares about correctly functioning, then it has to take responsibility for that functioning.

I think a little historical perspective is worthwhile here. It's important to remember that the Internet's end-to-end principle was formulated in response to a very different model, which wasn't just a circuit-switched model, but a circuit switched model in which the network coordination of load was an absolutely critical part of functionality. Internetworking in the coordinated-load model was desperately hard, and it resulted in heavy use of admission control at the borders between networks.

The Internet was built both in response to that and on top of it. Essentially all of the functional networks of the early days rode on top of nailed up circuits derived from the other model. The interconnections were always on, and the capacity management of Internetworking shifted from constituent networks to constituent nodes (and after a brief foray into developing congestion control models, it even worked).

So the end-to-end model wasn't just about making sure that the software at the endpoints were in charge of correct functioning of things they cared about; it was, at a very fundamental level, about getting better utilization from the network capacity itself--especially the internetworking capacity.

There are pieces of our model, in other words, that don't serve the goal Martin identifies, because they serve the capacity management goal instead.

(There's a huge digression here on why routing administrative boundaries aren't exposed to end nodes that I have elided, but we should about it over a beverage some time. I think that whole aspect of the Internet infrastructure needs re-thinking).

I saw this when working on high availability services in the cloud and the same was true when building boxes for network operators to deploy. Most of those were services or systems deployed in support of some larger application. Inevitably, our fault analysis revealed error conditions that we simply couldn't correct. Whether it be a hardware fault, a server crash, or packet loss, some errors can't be recovered without contextual knowledge of the goals of the application. Ultimately the responsibility for dealing with the error lies with the application as a whole.

I agree with this, and I think there are some optimizations that it points to that we do only haltingly. For DNS answers in a round-robin, for example, some hosts will cut servers that don't respond from list; happy eyeballs also provides a way to check and then store information about failures as well as preferences.

A protocol is in many ways a middle-man in the sense we're talking about. A protocol doesn't do anything, it exists only in the service of other activities.

I don't agree with this formulation. If you want to see this in distributed application terms, I'd say that a network protocol is an inter-process communication method that happens to cross over multiple networks rather than a backplane.

Of course, a protocol can do a great deal to support these end-to-end goals. For instance, An application that values reliability might benefit greatly by choosing TCP over UDP for its interactions. The extent to which a protocol is able to provide tools that are relevant to application needs determines the effectiveness of that protocol.

An example of how the tools break down is in the discussion of replay in HTTP when using TLS early data. We recently discovered that the web in no small way relies on clients retrying certain requests. This messes badly with the assumed view that certain requests were made at most once. See this draft https://datatracker.ietf.org/doc/html/draft-nottingham-httpbis-retry for a description of this.

One virtue in the simplicity of the core Internet protocols is that they concentrate on providing simple, well-understood tools and leave many of the end-to-end concerns to higher layers. And to varying degrees, we've managed well enough.

You might like to compare Internet protocols to modern computing: an IP packet is cheap and disposable, just like the hardware in a data centre. In both cases, cheapness also means unreliability and so people build systems that take that volatility into account with redundancy and error handling.

In the data centre, that produced distributed databases and the crazy multi-tier system architectures we see all over (CAP Theorum reigns!). In IP unreliability produced the TCP protocol and ultimately TLS. It turns out that integrity and confidentiality are more important in some ways for robustness than they are for things like privacy; you can talk to people about why they believe that major web sites deploy HTTPS and it's not always clear that privacy is the foremost reason. An encrypted transport (like QUIC) seems like a pretty natural conclusion when you look at things this way.

There's a worthy statement to make in there, but it's very different to where I think the document is currently headed.

I think what you're pointing to isn't "what is an endpoint", but "what is the relationship between a network protocol and reliability" or, maybe even more simply, "what is reliability". As you point out, there are types of reliability where robustness is critical and confidentiality is not--they just aren't types that we would say should be seen on the big I Internet.

The Document as Written

The alternative viewpoint is in exploring how the idea of what an endpoint is shifts based on context. This is a narrower framing of the goal and one that is probably more achievable. Of course, it's less satisfying.

From that I think that the statements about layers is super-relevant, but also virtually information-free. Every two-party protocol has two peers; every communication has an intended recipient. Anyone who has spent time with wireless networking has likely seen a diagram like this:

[image: Typical 3gpp Stack] https://camo.githubusercontent.com/3fdd0c8a64decb69bee88ac15ef7af8d9fc81a84/68747470733a2f2f696d616765732e6475636b6475636b676f2e636f6d2f69752f3f753d68747470253341253246253246666c796c69622e636f6d253246626f6f6b73253246342532463231352532463125324668746d6c2532463225324666696c6573253246303966696730362e67696626663d31

The idea here being that the peer for a given protocol layer is not the same all the way up the stack. If you look at Diameter or SIP, these notions are built into the protocols themselves. That said, I doubt that there are very few people on the planet that can describe how the various parts of SIP target different peers. In Diameter, the notion is relatively clean, but I can't profess to understand it fully. Distributed Endpoints

The points Ted makes about distribution of endpoints is important to capture; it's one of the key ways in which our perspective on this issue has evolved over time.

A protocol endpoint might manifest as a set of cooperating machines. I remember building server architectures that relied on forwarding of messages and exchanging state between multiple nodes, but the ultimate effect was that the cluster acted as a coherent single entity.

That abstraction does occasionally leak, and not always in a bad way. Even within the one protocol, individual protocol elements might target different entities.

-

In TLS, we realized that the server name extension is consumed by load balancers, even if they don't hold the keys that would allow them to authenticate as the server itself. This is now critical to the functioning of the protocol. You might regard that as ossification, and it certainly meets the criteria because it has screwed with our ability to deploy SNI encryption, but it's not necessarily a choice that we'd make differently even if that choice weren't taken from us already.

Writing down why that is so would be a really nice outcome of this in my opinion. I suspect that there is a minimum amount of information that must be exposed to allow a node to become a part of cooperative system, and that determining how to construct that minimum is part of the design tussles we see in a number of places right now. It's especially important because exposing it only to the potentially cooperating node would only be possible with a key management regime we don't possess.

-

In QUIC, the connection ID is intentionally constructed in a way that makes consumption by a load balancer easier.

This is where we also consider the role of a proxy, gateways to other systems, and the like. That might be an HTTP proxy (bleargh), a WebRTC gateway to legacy SIP, a SIP-ISDN gateway, a CoAP-HTTP gateway, or the numerous other examples that don't immediately spring to mind. In all these cases, the endpoint at some (or all) protocol layers is naturally the gateway or proxy. But for the purposes of the real application, these are merely speed bumps. Embracing the Leaky Abstraction

The notion that an endpoint might be distributed is something that isn't necessarily part of protocols we design, but it's a notion we can take advantage of in various ways. By building protocols that recognize the facts of deployment, we can better enable deployment strategies that achieve greater scale.

I think part of the question we're asking is: "would it be better if the protocols we design took that into account?"

I've some experience here with HTTP. The web is an example of where distribution is was available from the beginning. The ability to include content from different origins in a page makes distribution quite visible. Recently, there have been multiple techniques designed to greater allow for this sort of distribution, this includes Alternative Services RFC7838 https://datatracker.ietf.org/doc/html/rfc7838 and the ORIGIN frame http://httpwg.org/http-extensions/origin-frame.html.

DNS would probably melt without caching intermediaries.

There's pretty strong indications that it would not melt, but service would be concentrated into a small number of hands in ways that run up against the issues Jari has been talking about. It may be too late there, really, as the root and TLD servers are already highly concentrated.

You talk to a recursive resolver as though it were the entire DNS, but there's no attempt to hide the fact that the messages you send are from or for the authoritative servers. Security Aspects: Subversion of Intent

An important part of this formulation is subversion of expectations. Joe's comment is critical here, and - I think - closer to the mark than Lee's comments about the distinction between different types of endpoints (internet, security, application) and how they might not be co-resident.

Absent technical mechanisms (read: crypto) it's possible that the peers that you have might not be the peers that you expect. In some cases, we might have become acclimatized to the idea of intermediation (see NAT), in others it is basically indispensible (see DNS, TOR), but each use has a cost we need to acknowledge.

-

Routing is now so complex I have no hope of understanding it. I'm sure that we could find someone who could provide a perspective on this that would help.

An endpoint that constructs an IP packet often ends up talking to a NAT. That has implications for use of protocols at the next layer, where TCP is likely to work, UDP barely, and others not at all. Of course, we're now so inured to these effects that it's now just another facet of how the Internet (barely) works.

TCP terminators interjecting an endpoint that doesn't match the original intent. I'm told that TCP termination at an intermediary is great for throughput if you use Reno or Cubic, but that a congestion controller like BBR works much better when the middlebox is out of the way. Of course, the TCP terminator was already deployed and it's not getting removed. Now deploying BBR doesn't give you much of an advantage, maybe it's even a net loss. (Now we're back to the well-trodden ossification topic.)

TLS MitM requires a higher degree of control over endpoint. MitM here has a more significant impact because the expectations are that much stronger.

See earlier points about the deployment of security mechanisms in defense of intent. Trimming Scope

There are a few aspects of the captured inputs that I think could be trimmed out.

identity - I don't think that Eliot's suggestion about endpoint being coupled to identity is core to the thesis. It's definitely true that we identify an endpoint in interactions, but these identities are subjective and situational in the same way that endpoints are. That endpoints are identified is almost a truism. A more concrete use of identity might be relevant when we talk about how security mechanisms are applied to avoid subversion, as above.

state - Like with identity, I think that this merely a truism. Protocols are stateful.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/stackevo/endpoint-draft/issues/5, or mute the thread https://github.com/notifications/unsubscribe-auth/ABVb5JpMAFuyaoCm8Nx1EZs1XjPFpFofks5sFiQ1gaJpZM4N92kD .

britram commented 7 years ago

Have been trying to clear the decks for long enough to think about this properly. Didn't get a chance to, so here's my incoherent braindump instead. To some extent, the purpose of this draft in its current form is to draw precisely this discussion out.

Scattershot replies below, in hopes they eventually prove useful:

In IP unreliability produced the TCP protocol...

Yes (although TCP and IP were in essence codeveloped). TCP originally provided two kinds of reliability: stream synchronization (what I send is what you get) as well as transport state synchronization. The addition of state on intermediaries whose expiration would break the latter, as well as the diversification of link types and access models (e.g. mobility) cause synchronization to be lost, and are one major driver behind application-layer approaches to transaction and session reliability (as mnot's draft details in the web case).

It'd be really neat if applications didn't have to deal with this. There are three possible answers there: change the architecture so they don't (hard), declare this a library/platform problem (which is kind of the direction the TAPS WG is heading in), or decide that application layer semantics are so central to how reliability plays out that applications just have to suck it up and deal.

...and ultimately TLS. It turns out that integrity and confidentiality are more important in some ways for robustness than they are for things like privacy; you can talk to people about why they believe that major web sites deploy HTTPS and it's not always clear that privacy is the foremost reason. An encrypted transport (like QUIC) seems like a pretty natural conclusion when you look at things this way.

This is an interesting way to frame this, and I think more general (and useful) than how I've thought of this before: transport layer crypto doesn't just reinforce layer boundaries, it adds reliability in various dimensions to the notional end-to-end channel. So, in the framing of the draft as is, "the endpoint is the thing that does the crypto".

and now on distribution:

In TLS, we realized that the server name extension is consumed by load balancers, even if they don't hold the keys that would allow them to authenticate as the server itself. This is now critical to the functioning of the protocol. You might regard that as ossification, and it certainly meets the criteria because it has screwed with our ability to deploy SNI encryption, but it's not necessarily a choice that we'd make differently even if that choice weren't taken from us already.

One could also make the argument that the problem raised by LURK (and addressed by the ACME hack that came out of it) is another form of this endpoint distribution... though in contrast to the load balancer use case for SNI/CID, here the distribution really is across distinct identities.

Further on the identity point...

A more concrete use of identity might be relevant when we talk about how security mechanisms are applied to avoid subversion, as above.

I think one of the things that's falling out of the discussion is "applying security mechanisms to avoid subversion" might be a first order property of an endpoint.

Need to think a bit more about the wider framing questions...

martinthomson commented 7 years ago

A minor thing, but I keep coming back to the issue of load balancers and this point.

In TLS, we realized that the server name extension is consumed by load balancers, even if they don't hold the keys that would allow them to authenticate as the server itself.

I think that we can view this as an explicit signal to the middlebox rather than folding the middlebox into the ends. That makes it cleaner.

stackevo / endpoint-draft

Draft goals #5

A Reaffirmation of the End-to-End Principle

The Document as Written

Distributed Endpoints

Embracing the Leaky Abstraction

Security Aspects: Subversion of Intent

Trimming Scope

-

Routing is now so complex I have no hope of understanding it. I'm sure that we could find someone who could provide a perspective on this that would help.