sony / nmos-cpp

An NMOS (Networked Media Open Specifications) Registry and Node in C++ (IS-04, IS-05)
Apache License 2.0
144 stars 80 forks source link

Trouble parsing SDP a=fmtp line that has extra whitespace #44

Closed garethsb closed 5 years ago

garethsb commented 5 years ago

Report from @billt-hlit of a small corner case in the SDP parser refusing to find the sampling parameter in the fmtp attribute.

a=fmtp:96  sampling=YCbCr-4:2:2; width=1280; height=720; exactframerate=50; depth=10; TCS=SDR; colorimetry=BT709; PM=2110GPM; TP=2110TPW; SSN=ST2110-20:2017

See the problem? It took me a while to realize that the parser was failing because there were two spaces in from of "sampling", not just one.

Is this SDP attribute compliant? It's really hard to be sure. The grammar in RFC 4566 says

; sub-rules of 'a='
   attribute =           (att-field ":" att-value) / att-field

   att-field =           token

   att-value =           byte-string

so it has no opinion on how anything after the colon should be formatted. Its description of the fmtp attribute is ambiguous:

a=fmtp:<format> <format specific parameters>

Is that whitespace between the ">" and the "<" required to be a single space? I don't think this is a formal grammar at this point in the RFC. Finally, the SDP description in 2110-20 standard seems to be relatively clear on this point: "the section shall consist of a sequence of media type parameter entries, each followed by a semicolon (“;”) character followed by whitespace". But then we get vague: "a = pair, with no whitespace within the name or value or between the name, equal sign, and value". So, whitespace in front of the name isn't obviously forbidden, although such a name would have to appear first in a video fmtp attribute for it to be truly part of the name -- and that's an weirdly literal interpretation of the standard.

There's a beautiful symmetry to this: We had that discussion about the trailing space on this fmtp line in Wuppertal. Maybe you can have a discussion about leading spaces in Newbury. :o)

@garethsb-sony note: I'm informed that this at least is being made to follow convention in the ST2110 one-year review, so that it won't require a trailing ';', let alone the space.

Aside from that, if it is [isn't?] compliant, should we care, or should the parser be a bit more forgiving of whitespace in the fmtp attribute?

Also, a comment regarding logging: It took me a while to track down this problem, simply because I was looking at the wrong SDP when trying to figure out what was wrong. It's not clear to me how the code might be changed to include the offending fmtp line in the exception, if that's even possible, but that would have reduced my debugging time a fair amount.

garethsb commented 5 years ago

Its description of the fmtp attribute is ambiguous:

a=fmtp:<format> <format specific parameters>

Is that whitespace between the ">" and the "<" required to be a single space?

Yes, the SDP spec is unclear on that like much else. When writing SDP files the best one can do is usually to follow the spec where it's clear and convention where it isn't. I'd not previously seen an SDP file with multiple spaces there.

Also, a comment regarding logging: It took me a while to track down this problem, simply because I was looking at the wrong SDP when trying to figure out what was wrong. It's not clear to me how the code might be changed to include the offending fmtp line in the exception, if that's even possible, but that would have reduced my debugging time a fair amount.

I think the main problem is that there wasn't an exception during parsing. The parser just happily produced a list of parameters the first of which was called " sampling" with a leading space. By the point in the code that caused trouble, the original SDP lines have long gone.

If we were to make the extra space cause parsing to fail, you'd have got a message that told you the offending line number. So that's one approach.

Aside from that, if it is [isn't?] compliant, should we care, or should the parser be a bit more forgiving of whitespace in the fmtp attribute?

That's the flip-side. In the case of a well-written spec, implementing a strict parser encourages everyone to comply. In this case however, where the spec is unclear and it's pretty easy to argue that the intent of the file is still clear with the extra space, then selfishly, a more lax parser would help, wouldn't it, despite that it promulgates the 'error'?

It's not just the beginning and end of the format specific parameters in the fmtp attribute where whitespace handling could be more forgiving. Should the parser allow a separator of a single semi-colon with no space, or a semi-colon and forty-two spaces? What about that quirky "required" space after a=source-filter:. And there are other places where the separator in a list is a single space in the spec, but intent would still be clear with multiple spaces.

I wonder how forgiving would be just the right amount of forgiving...

Thoughts?

billt-hlit commented 5 years ago

“To err is human; to forgive, divine.”

2110-20 does not define "whitespace" (surprise!), but in everything I wrote above I was assuming "whitespace" means "[ \t]+" (and maybe end-of-line markers, although that gets hairy) and not "a single whitespace character". So, this becomes not merely a question of forgiveness but a question of definitions. I'm guessing my interpretation is a minority opinion, but I'm sure it's the right one. :clown_face:

I don't have much experience with the source-filter attribute (haven't bothered to implement it). That said, given that the RFC has an ABNF grammar, I don't see there is room or need for wiggle.

[edit: Confusion about logging deleted]

garethsb commented 5 years ago

2110-20 does not define "whitespace" (surprise!),

Yes, utterly unexpected that. ;-)

RFC4566 Section 5 does say:

An SDP session description consists of a number of lines of text of the form:

     <type>=<value>

where <type> MUST be exactly one case-significant character and <value> is structured text whose format depends on <type>. In general, <value> is either a number of fields delimited by a single space character or a free format string, and is case-significant unless a specific field defines otherwise. Whitespace MUST NOT be used on either side of the "=" sign.

I read that to say that unless specified differently, exactly one SP must be used as a field separator. Most of the SDP lines have an ABNF grammar that backs that interpretation, but, as you said, fmtp doesn't.

RFC4566 doesn't define "whitespace" either; in fact the above paragraph is the only use of the term! For what it's worth, the core terminals of IETF ABNF are defined by RFC4234 Appendix B.1. It defines a WSP (whitespace) character, as you suggested, as either SP (space) or HTAB (horizontal tab).

So, I think the conclusion is that in most cases, expecting and accepting only a single space is correct, but that in the case of fmtp, a forgiving parser should allow 1*WSP (i.e. one or more) before, between, and after, the format-specific-parameters.

garethsb commented 5 years ago

OK, that was a bit painful. @billt-hlit I'd be really grateful for some additional testing!

billt-hlit commented 5 years ago

The video fmtp attribute parsing looks good. On the other hand, you need to do the same thing for "i=" that you did for "s=".

garethsb commented 5 years ago

Is that true? The session name s= line is required and has the special language about using a single space as a minimum. The session/media information i= lines are optional, so no such special treatment seems warranted?

billt-hlit commented 5 years ago

Oh, sorry, I didn't realize that "byte-string" required a least one character. Seems like an unnecessary restriction, but the grammar is clear about it. So no, don't worry about it.

garethsb commented 5 years ago

Agreed.

I'm sure more edge cases are going to turn up. 😔

Closing... for now...

billt-hlit commented 5 years ago

Another obscure edge case from RFC 4570:

The fourth sub-field, <src-list>, is the list of source hosts/interfaces in the source-filter, and consists of one or more unicast addresses or FQDNs, separated by space characters.

A naive reading (why is everyone looking at me? :blush:) might interpret this as saying there are any number of space characters between each of the source addresses. On the other hand, the rest of the description in RFC 4750 falls under the "in general" section of RFC 4566 Gareth quoted above.

Opinions? I'm tempted to stick to the single-space interpretation here.

garethsb commented 5 years ago

Hi Bill,

Luckily in this case the ABNF in Appendix A comes to the rescue:

src-list =       *(unicast-address SP) unicast-address