tlswg / tls13-spec

TLS 1.3 Specification
565 stars 157 forks source link

Section 3.2 incorrectly implies double brackets are part of TLS presentation language #1315

Closed davidben closed 1 year ago

davidben commented 1 year ago

Not sure if it's too late to make changes, given we just had WGLC, but I noticed this just now and figured I'd file it in case. Section 3.2 says the following:

Comments begin with "/*" and end with "*/". Optional components are denoted by enclosing them in "[[ ]]" (double brackets). Single-byte entities containing uninterpreted data are of type opaque. A type alias T' for an existing type T is defined by: [...]

https://tlswg.org/tls13-spec/draft-ietf-tls-rfc8446bis.html#section-3.2

The first, third, and fourth of these are statements about the TLS presentation language itself. So it would suggest the second is as well. But I have never seen a random field be wrapped in double brackets for optionality. And given the parsing rules, it wouldn't even work. This is unparsable:

struct {
  [[ uint8 first; ]]
  [[ uint8 second; ]]
} MaybePair;

(Edit: Example updated because my first version was arguably parseable.)

This is sort of parsable...

struct {
  uint8 first;
  [[ uint8 second; ]]
} MaybePair;

But it becomes unparsable as soon as it's incorporated into a larger structure:

struct {
  MaybePair pair;
  uint16 blah;
} Nope;

struct {
  MaybePair pairs<0..2^16-1>;
} NopeNope;

The only two places I see this are in the definitions for enumerations and variants.

   enum { e1(v1), e2(v2), ... , en(vn) [[, (n)]] } Te;

   struct {
       T1 f1;
       T2 f2;
       ....
       Tn fn;
       select (E) {
           case e1: Te1 [[fe1]];
           case e2: Te2 [[fe2]];
           ....
           case en: Ten [[fen]];
       };
   } Tv;

But here [[ doesn't mean the object is optional. It means that, when you're defining an enum or using select, those components of the definition are optional.

From that, I gather Section 3.2 is actually saying that in the description of the presentation language, double brackets denote things that you may omit from the presentation language. But that doesn't fit with the rest of Section 3.2, which describe elements of the presentation language itself. This probably could be rephrased a bit. Like probably moved to the top of Section 3 as:

This document deals with the formatting of data in an external representation. The following very basic and somewhat casually defined presentation syntax will be used. In the definitions below, optional components of this syntax are denoted by enclosing them in "[[ ]]" (double brackets).

(Context is I tried to write a syntax highlighter for the language in https://github.com/pygments/pygments/pull/2455#issuecomment-1593035636 and got confused at first. 😄 )

richsalz commented 1 year ago

This seems like editorial clarification and can be fixed during IESG review. I also have a parser, I haven't tried it on this spec, but it's at https://github.com/richsalz/tlsparser/blob/main/gram.y within https://github.com/richsalz/tlsparser/

sayrer commented 1 year ago

I think another implementation to check here is Rustls, since that is the most sensible parsing implementation I've seen. It of course operates with the benefit of hindsight, but I'd hope this issue is not news there.

davidben commented 1 year ago

@sayrer This bug is about the textual presentation language used in specs to describe the wire format, not the wire format itself. Most TLS implementations do not parse the presentation language programmatically. The wire format is simple enough that it is usually easier and more flexible to just hand transcribe it into your choice of parsing setup, than to actually programmatically consume the presentation language. This include rustls.

Rich's parser is interesting because it's one of the few (only?) things that actually programmatically parse the presentation language. Glancing through Rich's grammar, it agrees with my bug report (unless I missed it!), that [[ and ]] do not appear to actually be part of presentation language.

sayrer commented 1 year ago

Let me apologize in advance if I'm being dense, but isn't this covered here: https://datatracker.ietf.org/doc/html/rfc8446#section-3.2

Edit: I think what you should do is write some torture tests for this problem. But a PEG grammar might be fine (just a concise way of writing recursive descent). But [[ etc are definitely part of the presentation language.

davidben commented 1 year ago

@sayrer Please see the issue description, which answers your question and discusses precisely that section.

That text reads as if it is part of the presentation language, but all the examples, and the core rules about how the TLS wire format is parsed, suggest otherwise. (Remember, the TLS wire format is based on plain concatenation, not tags. It doesn't work to just make random fields optional.) That is, the point of this bug is that the section you cited is misleading.

I don't understand your suggestion for a PEG grammar. Again, the issue here is not that something about the presentation language is hard to parse. As discussed above, it's rare to programmatically parse the presentation language in the first place. The issue is that some text in the definition of the presentation language is misleading.

sayrer commented 1 year ago

I'm not trying to be difficult here, but it's difficult to reconcile your opening comment "This is unparsable:" with your latest comment that says "Again, the issue here is not that something about the presentation language is hard to parse."

kaduk commented 1 year ago

I'm not trying to be difficult here, but it's difficult to reconcile your opening comment "This is unparsable:" with your latest comment that says "Again, the issue here is not that something about the presentation language is hard to parse."

It's actually pretty easy to reconcile. The prose in the document (RFC 8446) is hard to parse. The presentation language in the document is easy to parse.

I think that it would be pretty easy to clarify that "optional components of the presentation language itself are enclosed in double brackets" (but probably with more wordsmithing).

davidben commented 1 year ago

Ah yes, when I say "this is unparsable" I mean that, if you were to take Section 3.2 at face value, the examples of attempting to use double brackets in the presentation language would produce a structure that is unparsable on the wire. They would be unparsable as wire format structures because it would not be possible to unambiguously decode them.

For example, suppose you believe this was a valid structure in the presentation language.

struct {
  uint8 first;
  [[ uint8 second; ]]
} MaybePair;

struct {
  MaybePair pairs<0..2^16-1>;
} MaybePairs;

It is not possible to unambiguously parse a MaybePairs. The byte sequence 00 02 01 02 can either be parsed as a single MaybePair (first=01, second=02), or two MaybePairs, (first=01, second=null), followed by (first=02, second=null). (If you're unfamiliar with how the TLS wire format works, the first two bytes form the length prefix implied by <0..2^16-1>. The contents are then parsed as however many MaybePairs fit. Such a parse only works MaybePair is self-delimiting. A hypothetical [[ ... ]] construct would break that invariant.)

The solution is what @kaduk said. Section 3.2 is misleading. It means to say [[ denotes optional components of the presentation language. But as the rest of Section 3.2 describes actual components of the presentation language, it reads as if [[ is a component of the presentation language which denotes optional fields in the wire format. That second option does not work, as seen in the example above.

davidben commented 1 year ago

(Retitling the issue description as the rhetorical question was clearly a bit confusing.)

sayrer commented 1 year ago

Well, I'm not sure what to say here. I agree that TLS structures can be ambiguous. But, they are almost always constrained by an enclosing structure that addresses the issues raised here. I wrote a version of ECH, and that requires knowledge of the TLS wire format and the presentation language. I think the ambiguity concerning optional fields is real, but not unique.

https://github.com/grafica/rustls/blob/f85bd400896e0c5d6720c86afe0ebc3e2e54bb62/rustls/src/msgs/ech.rs

richsalz commented 1 year ago

Okay, so @sayrer, great that you weren't confused by the current text. Similarly-qualified people, such as @davidben , want to avoid a possible confusion. @kaduk 's comment https://github.com/tlswg/tls13-spec/issues/1315#issuecomment-1594086020 is the right fix.

sayrer commented 1 year ago

Yeah, sorry to kick up a fuss. I was confused by the back and forth between the presentation language and the wire format in the comments on this issue. Kaduk's suggestion is fine.