w3c / csswg-drafts

CSS Working Group Editor Drafts
https://drafts.csswg.org/
Other
4.48k stars 660 forks source link

Ambiguity or omission in list of steps for "CSS Syntax Level 3", section "4.3.4. Consume an ident-like token"? [css-syntax] #10120

Open amn opened 7 months ago

amn commented 7 months ago

Reading section 4.34 of "CSS Syntax Level 3" I am confused by two sentences, which I am unable to properly understand in order to, say, proceed in implementing a compliant tokenizer.

The [adjacent] sentences, quoting:

While the next two input code points are whitespace, consume the next input code point. If the next one or two input code points are U+0022 QUOTATION MARK ("), U+0027 APOSTROPHE ('), or whitespace followed by U+0022 QUOTATION MARK (") or U+0027 APOSTROPHE ('), then create a <function-token> with its value set to string and return it.

It's the second sentence that is of primary concern, really. What condition does it exactly express? Between all the "or" in the sentence, I confess it reads in a manner I can't wrap my head around.

Also, regarding the first sentence, if the next two input code points (previous sentence) are (both) whitespace, after consuming the next input code point (which is the first of the whitespace, necessarily), the subsequent code point is assertively whitespace, no?

But anyway, without properly "parsing" the second sentence, I don't think the issue with the first sentence is of much importance, at least the way I understand the context.

cdoublev commented 7 months ago

I agree. To put it more verbosely, it means "if the next input code point is U+0022 QUOTATION MARK (") or U+0027 APOSTROPHE ('), or if the next two input code points are a whitespace followed by U+0022 QUOTATION MARK (") or U+0027 APOSTROPHE (')", ....

Although it seems more correct not to consume leading whitespaces in this algorithm (see #3600), I suspect this may be no longer required. It could just be "consume as much whitespace as possible; if the next input code points is [a quote then] create a <function-token>".

(irrelevant) That said, in Chrome and FF, url( /**/"img.jpg") is invalid but url( /**/img.jpg) is valid, so there might back-compatibility at play here, but no corresponding case on WPT.

tabatkins commented 7 months ago

Although it seems more correct not to consume leading whitespaces in this algorithm (see https://github.com/w3c/csswg-drafts/issues/3600), I suspect this may be no longer required. It could just be "consume as much whitespace as possible; if the next input code points is [a quote then] create a <function-token>".

That would fail to emit a whitespace token given url( "foo" ), which I'd like to preserve, since any other function would emit a whitespace there.

tabatkins commented 7 months ago

It's the second sentence that is of primary concern, really. What condition does it exactly express?

Ah, this is just the issue we ended up discussing in WHATWG chat, right? It looks like you filed this issue about 15 minutes before i responded in the chat room. ^_^

Just to transfer the conclusions over into a permanent medium, we decided to go ahead and rephrase this into a more explicit list, something like, what Guillaume posted, because it really does have a bunch of "or"s in it that you have to correctly guess the scope of, and there's not really a need for that.

amn commented 7 months ago

Thank you, Tab. Yes, I originally asked on the issue in the #WHATWG Matrix room, and wrote it then here upon recommendation of someone replying.