[mediaqueries-5] Parsing `<media-query-list>` when the input is `<whitespace-token>`

cdoublev commented 1 year ago

This is a follow-up of #7040, which reports an issue when parsing a whitespace against <media-query-list>.

If I am not mistaken, in @media {}, the input would be the <whitespace-token> between @media and {}.

To parse a <media-query-list> production, parse a comma-separated list of component values, then parse each entry in the returned list as a <media-query>. Its value is the list of <media-query>s so produced.

https://drafts.csswg.org/mediaqueries-5/#typedef-media-query-list

Normalize input, and set input to the result.

Let groups be an empty list.

While input is not empty: [...]

Consume a list of component values from input, with <comma-token> as the stop token, and append the result to groups.

Discard a token from input.

Return groups.

https://drafts.csswg.org/css-syntax-3/#parse-a-comma-separated-list-of-component-values

If I am not mistaken, groups contains <whitespace-token> as its unique item, which is parsed against <media-query>, which returns an invalid result.

A media query that does not match the grammar in the previous section must be replaced by not all during parsing.

https://drafts.csswg.org/mediaqueries-5/#error-handling

If I am not mistaken, the invalid result in groups is replaced by not all.

In Chrome/FF:

<style>
  @media {} /* Evaluates to true */
</style>
<script>
  document.styleSheets[0].cssRules[0].cssText// @media  {\n}
</script>

Indeed, their behavior conforms to the spec:

Note: This definition of <media-query-list> parsing intentionally accepts an empty list.

An empty media query list evaluates to true.

Before this resolution to preserve invalid media queries, I would have suggested to parse <media-query-list> with parse a comma-separated list according to a CSS grammar, which handle whitespaces by returning an empty list. But it also replaces comma-separated values by the result from their parse against the given grammar (<media-query>): invalid media queries are lost at this step.

I also feel like whitespaces between an at-rule name and its prelude should be consumed in consume an at-rule (like whitespaces between a declaration name and value), and that the prelude of @media should be defined with <media-query-list>?.

tabatkins commented 1 year ago

If I am not mistaken, in @media {}, the input would be the between @media and {}.

You're mistaken. For the purpose of grammar parsing, whitespace is always ignored, unless there's prose specifically defining that it is significant in some way. So in @media {...}, the <media-query-list> production is just an empty list of tokens. The parsing rules then run on it, producing an empty list.

This is implied by https://drafts.csswg.org/css-values/#component-whitespace, tho it could perhaps be a little clearer.

cdoublev commented 1 year ago

Ah ok, thanks.

I would be happy if it was defined that when parsing a grammar, there is no guarantee that the input does not come with a leading or a trailing whitespace. Or if they were always consumed (in consume an at-rule and consume a list of component values) before parsing a grammar.

Currently they seems to be consumed only in consume a declaration, not for CSSStyleDeclaration.setProperty(property, value) or CSS.supports(property, value), which I think makes a difference for custom property values and property values containing an arbitrary substition value, when they include a leading or trailing comment.

tabatkins commented 1 year ago

Hm, I could probably do with some more specificity around that, yeah. Marking this for Syntax rather than MQ.

cdoublev commented 9 months ago

Fwiw, I think leading and trailing whitespaces can be removed from any list of component values (prelude, declaration value, etc) in consume a list of component values, which is a better separation of concerns imo.

**Consume a list of component values** 1. Discard whitespace from `input`. 2. Process `input`: - `` or one of the stop token (if passed): while the last item in `values` is a ``, remove that token, return `values`. - anything else: consume a component value from `input`, and append the result to `values`. **Consume a declaration** Steps 4 and 5 are replaced with: 4. Consume a list of component values from `input`, with `` as a stop token, and `<}-token>` if `nested` is true. **Consume a qualified rule** 1. Consume a list of component values from `input`, with `<{-token>` as a stop token, and `<}-token>` and `` if `nested` is true. Assign the result to the rule's prelude. 2. Process `input`: - `<{-token>`: [same as currently written] - anything else: this is a parse error, return nothing. **Consume an at-rule** 1. Consume a list of component values from `input`, with `<{-token>` and `` as stop tokens, and `<}-token>` if `nested` is true. Assign the result to the rule's prelude. 2. Process `input`: - ``, ``: [same as currently written] - `<{-token>`: [same as currently written] - `<}-token>`: if `rule` is valid in the current context, return it, otherwise return nothing.

Basically, it would be invoked with an optional list of stop tokens (without the nested argument):

, for a comma-separate list
; and } for a declaration value
{, }, ; for the prelude of a nested rule
{ for the prelude of a top-level qualified rule
{ and ; for the prelude of a top-level at-rule

w3c / csswg-drafts

[mediaqueries-5] Parsing `<media-query-list>` when the input is `<whitespace-token>` #9173