Closed davidben closed 1 year ago
This seems like editorial clarification and can be fixed during IESG review. I also have a parser, I haven't tried it on this spec, but it's at https://github.com/richsalz/tlsparser/blob/main/gram.y within https://github.com/richsalz/tlsparser/
I think another implementation to check here is Rustls, since that is the most sensible parsing implementation I've seen. It of course operates with the benefit of hindsight, but I'd hope this issue is not news there.
@sayrer This bug is about the textual presentation language used in specs to describe the wire format, not the wire format itself. Most TLS implementations do not parse the presentation language programmatically. The wire format is simple enough that it is usually easier and more flexible to just hand transcribe it into your choice of parsing setup, than to actually programmatically consume the presentation language. This include rustls.
Rich's parser is interesting because it's one of the few (only?) things that actually programmatically parse the presentation language. Glancing through Rich's grammar, it agrees with my bug report (unless I missed it!), that [[
and ]]
do not appear to actually be part of presentation language.
Let me apologize in advance if I'm being dense, but isn't this covered here: https://datatracker.ietf.org/doc/html/rfc8446#section-3.2
Edit: I think what you should do is write some torture tests for this problem. But a PEG grammar might be fine (just a concise way of writing recursive descent). But [[
etc are definitely part of the presentation language.
@sayrer Please see the issue description, which answers your question and discusses precisely that section.
That text reads as if it is part of the presentation language, but all the examples, and the core rules about how the TLS wire format is parsed, suggest otherwise. (Remember, the TLS wire format is based on plain concatenation, not tags. It doesn't work to just make random fields optional.) That is, the point of this bug is that the section you cited is misleading.
I don't understand your suggestion for a PEG grammar. Again, the issue here is not that something about the presentation language is hard to parse. As discussed above, it's rare to programmatically parse the presentation language in the first place. The issue is that some text in the definition of the presentation language is misleading.
I'm not trying to be difficult here, but it's difficult to reconcile your opening comment "This is unparsable:" with your latest comment that says "Again, the issue here is not that something about the presentation language is hard to parse."
I'm not trying to be difficult here, but it's difficult to reconcile your opening comment "This is unparsable:" with your latest comment that says "Again, the issue here is not that something about the presentation language is hard to parse."
It's actually pretty easy to reconcile. The prose in the document (RFC 8446) is hard to parse. The presentation language in the document is easy to parse.
I think that it would be pretty easy to clarify that "optional components of the presentation language itself are enclosed in double brackets" (but probably with more wordsmithing).
Ah yes, when I say "this is unparsable" I mean that, if you were to take Section 3.2 at face value, the examples of attempting to use double brackets in the presentation language would produce a structure that is unparsable on the wire. They would be unparsable as wire format structures because it would not be possible to unambiguously decode them.
For example, suppose you believe this was a valid structure in the presentation language.
struct {
uint8 first;
[[ uint8 second; ]]
} MaybePair;
struct {
MaybePair pairs<0..2^16-1>;
} MaybePairs;
It is not possible to unambiguously parse a MaybePairs
. The byte sequence 00 02 01 02
can either be parsed as a single MaybePair (first=01, second=02), or two MaybePairs, (first=01, second=null), followed by (first=02, second=null). (If you're unfamiliar with how the TLS wire format works, the first two bytes form the length prefix implied by <0..2^16-1>
. The contents are then parsed as however many MaybePairs
fit. Such a parse only works MaybePair
is self-delimiting. A hypothetical [[ ... ]]
construct would break that invariant.)
The solution is what @kaduk said. Section 3.2 is misleading. It means to say [[
denotes optional components of the presentation language. But as the rest of Section 3.2 describes actual components of the presentation language, it reads as if [[
is a component of the presentation language which denotes optional fields in the wire format. That second option does not work, as seen in the example above.
(Retitling the issue description as the rhetorical question was clearly a bit confusing.)
Well, I'm not sure what to say here. I agree that TLS structures can be ambiguous. But, they are almost always constrained by an enclosing structure that addresses the issues raised here. I wrote a version of ECH, and that requires knowledge of the TLS wire format and the presentation language. I think the ambiguity concerning optional fields is real, but not unique.
Okay, so @sayrer, great that you weren't confused by the current text. Similarly-qualified people, such as @davidben , want to avoid a possible confusion. @kaduk 's comment https://github.com/tlswg/tls13-spec/issues/1315#issuecomment-1594086020 is the right fix.
Yeah, sorry to kick up a fuss. I was confused by the back and forth between the presentation language and the wire format in the comments on this issue. Kaduk's suggestion is fine.
Not sure if it's too late to make changes, given we just had WGLC, but I noticed this just now and figured I'd file it in case. Section 3.2 says the following:
https://tlswg.org/tls13-spec/draft-ietf-tls-rfc8446bis.html#section-3.2
The first, third, and fourth of these are statements about the TLS presentation language itself. So it would suggest the second is as well. But I have never seen a random field be wrapped in double brackets for optionality. And given the parsing rules, it wouldn't even work. This is unparsable:
(Edit: Example updated because my first version was arguably parseable.)
This is sort of parsable...
But it becomes unparsable as soon as it's incorporated into a larger structure:
The only two places I see this are in the definitions for enumerations and variants.
But here
[[
doesn't mean the object is optional. It means that, when you're defining anenum
or usingselect
, those components of the definition are optional.From that, I gather Section 3.2 is actually saying that in the description of the presentation language, double brackets denote things that you may omit from the presentation language. But that doesn't fit with the rest of Section 3.2, which describe elements of the presentation language itself. This probably could be rephrased a bit. Like probably moved to the top of Section 3 as:
(Context is I tried to write a syntax highlighter for the language in https://github.com/pygments/pygments/pull/2455#issuecomment-1593035636 and got confused at first. 😄 )