[css-values-4] Switch advanced attr() to being var()-like

tabatkins commented 5 years ago

Currently, the "advanced" attr() functionality that lets you specify a type to parse as and a fallback to use is specified to require treating the attr() as the specified type during parse time. That is, width: attr(foo color); must be rejected as invalid at parse time, because width doesn't take <color> arguments.

This definition predates the introduction of var() and its specifying of how to handle values that are invalid when substituted in; at the time attr() was written, "invalid at computed-value time" didn't exist, so the only way we knew how to handle this sort of thing was to eagerly typecheck and specify defaults.

One of our Chrome engineers (@xiaochengh) is looking into implementing advanced attr(), and reviewing it with them, it looks like implementing the function as specified will be quite complicated; handling its type-checking will require additions to every manually-parsed grammar in the browser.

However, switching attr() over to acting like var() should be much simpler. All the machinery for handling something substituted at computed value time, and possibly being iacvt, already exists and can be leaned on pretty easily for all properties.

(The type argument does still have value here. For one, it's a switch that dictates whether you take the attribute value as a literal string, or CSS-tokenize it. For two, it advertises to later stylesheet readers what's expected. For three, it helps you guard against garbage in attributes, which is more likely than garbage in variables, and use a fallback instead. For four, the px/etc values are useful for parsing plain numeric attributes, which are common in XML languages, into the appropriate CSS unit, without needing calc() shenanigans.)

So the proposal, in full:

drop the spec text that gives attr() a specific type. Instead, using attr() makes the property always valid at parse time, like var() does.
- possibly only do this when you provide a type: an untyped attr() is still string-only and only usable in 'content'; to use a string outside of 'content' explicitly say attr(foo string).
at computed-value time, attr() retrieves the attribute value and parses it according to the provided type argument. If it fails to parse, use the fallback value instead; if there's no fallback, use the guaranteed-invalid value instead.
substitute the attr into the property, using the same rules as var(). Just like var(), you then grammar-check the property, and if it fails, it's invalid at computed-value time (iacvt).
add a tokens type (or maybe spell it *) that does no parsing, but does still tokenize the value and substitute it accordingly (rather than substituting it as a <string>), to handle more complicated cases, now that we're free from the tyranny of parse-time type-checking. This type will only trigger fallback if the attribute isn't present at all.
remove the "default value" definitions for each type; they were necessary before we had iacvt as a concept, but not anymore. If you want to provide a default value, that's what the fallback is for.

Thoughts?

(Note for the Agenda+; I won't be attending the Nov 6th call, so please schedule it for the Nov 13th call instead.)

frivoal commented 5 years ago

Doesn't seem to diminish what you can do with it. If that increases the chances of it happening, I'm all for it.

tabatkins commented 5 years ago

Yeah, it should be absolutely no loss in correct functionality; the only loss is in catching some authoring errors earlier.

You actually gain some functionality via the new tokens type, letting you, for example, provide shadows or transforms in an attribute.

Crissov commented 5 years ago

Sounds reasonable to me and would probably better match author expectations, thus improve the overall experience.

By the way, attr() currently does not support some of the more recent type additions like <image>, <position> and <resolution>. Iʼm not sure whether this is intentional.

Besides being able to parse as a complete <color>, attributes could also hold color components separately. Therefore, attr() would be useful inside functional color notation. Itʼs quite likely there (but also elsewhere), that an implied unit would be used, such that an attribute value of 1 might represent 1, 1.0, 100% (as in <alpha-value> and <cmyk-component>), 1% or 1deg (as in <hue>). To make this easier to handle, I think this part:

%, A keyword matching one of the <length>, <angle>, <time>, or <frequency> units

should be changed to use a familiar calc()-like expression instead.

attr(foo px) and calc(attr(foo number) * 1px) would become attr(foo * 1px).
attr(foo %) and calc(attr(foo number) * 1%) would become attr(foo * 1%).
calc(attr(foo %) * 100) and calc(attr(foo number) * 100%) would become attr(foo * 100%).

emilio commented 5 years ago

There is some discussion at the end of https://bugzilla.mozilla.org/show_bug.cgi?id=435426, where I suggested something similar, though I wasn't smart enough to envision the calc(attr(foo number) * 1px) hack that allows you to actually not loose functionality :)

faceless2 commented 5 years ago

I see no value in allowing image or position - I think you're opening a dangerous door once you allow structures like <div data-foo="linear-gradient(red, blue)"/>. Although adding resolution seems harmless enough.

For what it's worth we've implemented attr() as specified, although we're not yet shipping so that shouldn't factor into amy decision to change the definition. Yes, the type checking on parse seemed an unnecessary step, as the value is going to be validated again anywhere when you evaluate the property. I also agree the type is required - the most common use for attr() from where I sit is attr(href uri), which only works if there is a type. And the distinction between "length" and "px" is also valuable.

@Crissov - as we read it, attr can already be used anywhere, including to represent color components - these are already valid expressions:

color: rgb(attr(red number), attr(green number), attr(blue number));
width: calc(50px * attr(multiplier number, 4));

But again, I don't think think there's any value in changing the attr() syntax to be some sort of calc-lite: You're not adding any new functionality with this suggestion, and the existing syntax is not impractical. Change for change's sake should be strongly discouraged.

However, as we're looking at attr I'd like to suggest removing the restriction on the fallback value not being an attr() expression itself. This restriction no longer makes sense if the values are being evaluated later, as they are for var()

emilio commented 5 years ago

Yeah, I agree with the above regarding the simplicity, not so sure about the attr-in-attr thing, that means you need to do cycle detection on that too, and that's one of the most expensive parts of var().

For what is worth, the other potentially annoying thing from a browser developer perspective is the opportunity for XSS. We right now sanitize away "unsafe" attributes / elements like xlink:href in <use> elements or what not. attr() opens the door to make any attribute potentially an image load, and thus which attributes should you sanitize becomes either "all of them", or starts depending on the CSS rules that apply on the page, or what not. I wonder if @hsivonen or @freddyb have thoughts about this kind of thing. Though IIRC the only thing we used to sanitize from style attributes and stylesheets is -moz-binding, so maybe image loads and such are not a problem at all and I'm just being paranoid.

hsivonen commented 5 years ago

Being able to turn any attribute into an image load URL would indeed bypass the URL attribute sanitizer in Gecko.

Does Gecko currently support using attr() to take a URL from any attribute as e.g. a background image?

faceless2 commented 5 years ago

Loading an image this way doesn't seem to work in Firefox, Chrome or Safari. But it seems to be because the attr() function isn't parsing.

https://jsbin.com/fiqehuketi/edit?html,output

emilio commented 5 years ago

Does Gecko currently support using attr() to take a URL from any attribute as e.g. a background image?

We currently don't support it, but implementing this feature would allow it.

xiaochengh commented 5 years ago

Do we have valid use cases of attr-in-attr?

In my intuition, attr() is designed for passing extra input to CSS. So IMO it's an incorrect usage to have complicated computation inside attr in attr -- if such kind of computation is needed, it seems more natural to be done either in Javascript or with custom properties.

Anyway, it would be a great news for me if I don't need to implement that 😃

gibson042 commented 5 years ago

Any attribute that overrides or partially overrides another is a potential use case.

attr(title, attr(pattern)) to explain <input> constraints
attr(download, attr(href)) to represent hyperlinks
attr(data-selection, attr(data-default)) to consume hierarchical custom data attributes

xiaochengh commented 5 years ago

@gibson042 The use cases you gave are totally valid, and in general, using attr() in fallback value doesn't seem to introduce cyclic dependency by itself.

And I was thinking of a different thing though. Sorry for the confusion.

What if the referenced attribute itself contains attr()? For example:

<div id=target foo="attr(bar *)" bar="attr(foo *)"></div>
#target { --whatever-property: attr(foo *); }

Or a detour via var():

<div id=target foo="var(--prop-a)" bar="var(--prop-b)"></div>
#target {
  --prop-a: attr(bar *);
  --prop-b: attr(foo *);
}

So do we have valid use cases like that?

If not, we might want to consider making attr(foo) ivact if attribute foo itself contains attr() (and possibly var()?)

heycam commented 4 years ago

I'm generally in favor of the proposal modulo the concerns that Emilio and Henri raised.

I don't know if we have valid use cases for variable references inside attr()-referenced attributes, or vice versa. But if we model attributes as another kind of variable, then we should be able to use the same iacvt processing when encountering cycles.

Is there any reason not to make all attr() values valid at parse time?

I do find it a little odd that the second argument to attr() names CSS syntax types and other operations like dimension value construction, and how e.g. string means take the literal string content and turn it into a <string> token, while number means to parse the attribute value as a <number> token. I'm half wondering whether it makes sense to align with the syntax used for custom property registration for things like the latter.

css-meeting-bot commented 4 years ago

The CSS Working Group just discussed Switch advanced attr() to being var()-like.

The full IRC log of that discussion

<dael> Topic: Switch advanced attr() to being var()-like
<dael> github: https://github.com/w3c/csswg-drafts/issues/4482
<dael> TabAtkins: Several years ago we defined the more complicated attr() functionality where it supplies the type. If you say foo=5px we parse as length.
<dael> TabAtkins: No one impl. I realized why.
<dael> TabAtkins: It ends up being high cost for low value. Type checking eagerly so at parse time we can reject it it means every thing that does grammar checking have to account for possibility of attr() being there
<dael> TabAtkins: Lots of fiddly detail work.
<dael> TabAtkins: We did it because we don't have valid at parse time but rejected later. We now have that for var(). The var() machinery and building on that gives us a lot of tools that didn't exist earlier which make attr() easier
<bradk> 🐈
<dael> TabAtkins: Precise details of grammar aren't laid out, but core is we make attr() act like var(). It makes poperty automatically valid at parse time and we do parse at computed time. We validate at that time.
<dael> TabAtkins: Specifying type lets you validate you put the right thing in the attr(). Handling attributes elsewhere tends to allow garbage and ignore. We maintain that and check type and make sure it works.
<dael> TabAtkins: If we base on var() it's the same functionality for authors and a significant decrease in implementation complexity.
<dael> TabAtkins: I'd like to persue this change and the impl wants to experiment in it
<fremy> I strongly support this!
<dael> TabAtkins: Is WG ameniable?
<dael> emilio: I'm not opposed but concerned about type checking token string and then doing parsing again. When I looked at impl attr() I suggested doing it like variables in bugzilla.
<dael> emilio: Complexity of doing attr didn't seem so high either. I'm concerned about parsing, tokenization, and then parsing on performance.
<dael> emilio: Other concern is XSS but that happens either way
<dael> TabAtkins: Reason why I don't htink first bit is a concern is it ends up being identical to custom values and properties API. Ideal is it works that way but it's inline
<dael> emilio: I think that's also a concern with custom properties. I don't want to block on it, it's mostly theoretical
<dael> TabAtkins: Never say never but I doubt used in performance sensistive ways
<dael> AmeliaBR: My first concern would be how can we make it work logically with the existing use of the attribute function in the content property
<dael> AmeliaBR: You've been talking as distinguishing if an explicit type is set. More I'm thinking maybe not necessary. If you don't have an explicit type the type is assumed string and any attribute can be parsed as a string, returned in a string. So maybe not an issue b/c string is always valid in content
<dael> AmeliaBR: I'd like to see the exact write up and make sure it makes sense in backwards compat without special behavior
<dael> TabAtkins: Not possible w/o any backwards compat b/c assume valid at parse time. Content properties currently invaid but use attr() become valid at parse time. It is a behavior change if we make unspecified type attr() use this.
<dael> TabAtkins: Not sure what's best if we split parsing into separate function rather then flag it as attr() here. Puts you in 2 parsing modes based on detail of function grammar.
<dael> TabAtkins: If we think it's okay for behavior change in content where you wrote an invalid with a fallback and you're relying on that that seems minor. Otherwise good with your option.
<dael> TabAtkins: There's some possibilities there, we can experiment
<dael> AmeliaBR: Youre example of something suddenly valid is if something else in content property would be a parse error. Like using slash syntax with alternative text in a browser that doesn't support makes a difference if it's parse itme error
<dael> TabAtkins: Exactly. You'd no longer have the fallback
<dael> AmeliaBR: I would lean toward having a separate function for the type version and use attr() for how it's curerntly supported. Might be problematic for UA that support attr() more widely
<dael> TabAtkins: No idea if various printers support. I know no web browsers do. I'm not sure impl quality of whole thing
<dael> TabAtkins: But this is a behavior change. It will be off if there's a current impl. It's a custom thing or breaking change
<dael> TabAtkins: If no other questions just want to check for objections for me creating a full write up of changes. I can do that for review next week
<dael> Rossen_: Objections?
<dael> fantasai: Summary?
<dael> TabAtkins: There's a lot of possible ways how, but it's a change in validation to make it more var() like
<dael> TabAtkins: I'll have a write up fully next week. What's in the issue is the right jist
<dael> TabAtkins: It's switch attr() to var()-type validation rather than strict parse time validation
<dael> emilio: THe fallback might be able to be fix for attr(). Unfortunate to add new type of attr() that can't be detected. Nice if forced to a valid type. Worth thinking about
<dael> TabAtkins: Yep.
<dael> Rossen_: Objections to the approach of switch attr() to var()-type validation rather than strict parse time validation
<astearns> +1 to try this out
<dael> fantasai: Not sure, but let him write it up
<dael> Rossen_: TabAtkins there's no objections. Go ahead and write it up and we'll look when you're ready
<dael> cat: meow

upsuper commented 4 years ago

I don't know if we have valid use cases for variable references inside attr()-referenced attributes, or vice versa. But if we model attributes as another kind of variable, then we should be able to use the same iacvt processing when encountering cycles.

This makes me wonder whether there would be extra complexity to handle cycle involving both attr and var. Maybe not a difficulty spec-wise, but the implementation may need some refactor.

Being able to turn any attribute into an image load URL would indeed bypass the URL attribute sanitizer in Gecko.

How does the sanitizer work against CSS variable? Or it doesn't matter?

emilio commented 4 years ago

This makes me wonder whether there would be extra complexity to handle cycle involving both attr and var. Maybe not a difficulty spec-wise, but the implementation may need some refactor.

Yeah, I'd prefer to avoid this if possible.

How does the sanitizer work against CSS variable? Or it doesn't matter?

I don't think it matters. If you're allowing CSS variables means that you're allowing <style> elements / style attributes, and then you may as well set background-image directly (without a variable).

The interesting case here is where you disallow styles, but end up injecting styles anyway due to an attr() function.

tabatkins commented 4 years ago

To make this easier to handle, I think this part: [omitted] should be changed to use a familiar calc()-like expression instead.

I don't like this approach; either * 1px is a special-cased token sequence that just indicates the same thing that "px" does in the current spec, or a full calculation is possible, which requires some complex new parsing (to allow for an ident at the beginning, which isn't part of calc()) and hooking into the "math function" part of the spec. That's a lot of complexity to avoid either a keyword or just using attr() inside a calc(). It's also unclear how this would mesh with the non-numeric attr() cases.

Yes, the type checking on parse seemed an unnecessary step, as the value is going to be validated again anywhere when you evaluate the property.

While not technically necessary, I think it's useful as a way to trigger the fallback value. This functionality isn't present in var(), but I think that attributes are more likely to accidentally contain trash (most particularly, the empty string or whitespace, instead of being deleted) than custom properties are, and being able to trigger your default value in those cases seems useful.

Although adding resolution seems harmless enough.

Yes, I'm adding the same types that Typed OM is aware of: resolution and flex are new additions.

not so sure about the attr-in-attr thing, that means you need to do cycle detection on that too, and that's one of the most expensive parts of var().

Currently the spec requires the attribute value to be a simple literal; attr(foo length) would fail for foo="calc(1px + 2px)", as it's looking for a dimension token. I probably want to relax that, but I do plan on explicitly disallowing attr(), var(), and env() in the attribute value; their presence will make the value illegal so it'll use the fallback instead. This avoids all the dependency-tracking issues, without losing anything significant I think; writing foo="attr(bar px)" seems like a weird thing to do.

It also keeps the information flow to a minimum; you'll only get precisely the attribute you requested injected into your stylesheet, not an unexpected attribute or a random custom property.

(I'm excluding env() because it's an another arbitrary reference, especially once author-defined env() gets nailed down.) (I'll also put a note in the spec about excluding future reference functionality, to remind us to amend this spec if we add more stuff in the future. For example, I think I'll want to exclude custom functions once they exist.)

attr() opens the door to make any attribute potentially an image load,

The threat surface is just that y'all sanitize URLs proactively, yeah? And this would allow URLs to get fed into the system that y'all haven't pre-examined.

It doesn't look like this allows any attacks in XSS terms or anything; an attribute can only cause an image load if the author purposely put an attr(foo url) in an image-loading property, which seems pretty explicit about their intent. And with attr() being disallowed from the attribute's value, it can't shift the attack surface to an unexpected attribute either.

Is this right? How bad is the sanitization issue?

Is there any reason not to make all attr() values valid at parse time?

Yeah, after discussion on the call, this is my plan. It's technically a behavior change for existing attr() uses, but a minor one; if an author is currently writing something like content: "foo"; content: attr(foo) 10px; (accidentally relying on the second declaration being invalid and instead using the first), then after this change it would instead accept the second declaration and just be iacvt. That seems like a pretty insignificant worry to me, tho. (In particular, if they're currently writing an invalid 'content' by itself, with no preceding 'content' to fall back to or 'content' from another lower-specificity rule to cascade with, then iacvt is identical behavior to parsing failure in user-observable respects.)

The interesting case here is where you disallow styles, but end up injecting styles anyway due to an attr() function.

Can you elaborate on this, @emilio?

emilio commented 4 years ago

Can you elaborate on this, @emilio?

I just meant that the interesting case in terms of the sanitization-bypass that I mentioned is when you're disallowing the style attribute and <style> elements (as otherwise it is pointless).

Is this right? How bad is the sanitization issue?

I'm not the right person to evaluate that; @hsivonen maybe?

css-meeting-bot commented 4 years ago

The CSS Working Group just discussed [css-values-4] Switch advanced attr() to being var()-like.

The full IRC log of that discussion

<dael> Topic: [css-values-4] Switch advanced attr() to being var()-like
<dael> github: https://github.com/w3c/csswg-drafts/issues/4482
<dael> TabAtkins: I finished the edits to re-write and they're in Values 4 ED. I put attr() in its own section. mostly cribbed from var spec. I have a note to re-write substitution algo.
<AmeliaBR> commit for changes: https://github.com/w3c/csswg-drafts/commit/ed7beac806cc4753d8134857ff526150bb2a631c
<dael> TabAtkins: Otherwise it works similar to old. You give a type, checkes if the attribute exists, and subsitutes in if it does. Assume valid at parse time. If it fails when you put it in it's invalid at computed value time.
<dael> TabAtkins: Our impl thinks it's reasonable. If anyone else wants to look it would be great.
<dael> TabAtkins: Only poss controversy is spec what type the value is, doing parsing on it. If you say color it's recognized as a color. Want to keep b/c attributes messier channel. Easier to mess with, scripts sometimes write them. Ensuring a valid thing comes out and then switch to default seems valuable here.
<dael> TabAtkins: I'm happy with current. Any further comments or strong opinions please give them here or in issue.
<dael> TabAtkins: I'll ask for resolution to approve next week.
<dael> Rossen_: Thanks.
<dael> Rossen_: Action on everyone who cares about these changes to review so we can discuss on next call.
<florian> +1 to the design goals. Will review details

tabatkins commented 4 years ago

Reminder to please review the new text at https://drafts.csswg.org/css-values/#attr-notation

frivoal commented 4 years ago

I like it. I think it'd just suggest adding one more attr-type (called tokens or auto ?) that just doesn't do first-pass type validation (at least, nothing beyond https://drafts.csswg.org/css-syntax-3/#typedef-declaration-value), and passes the whole content of the attribute as a list of tokens. Most of the time, I think the validation by type is useful, but occasionally it may be limitting, so it could be nice to allow bypassing it, since the underlying machinery will be there anyway.

That way, you could write stuff like that:


<style>
.foo { background: attr(data-light-background auto, white); }
@media (prefers-color-scheme: dark) {
  .foo { background: attr(data-dark-background auto, black);}
}
</style>
<div class=foo data-dark-background="center / contain no-repeat url("logo.svg"), #222 35% url("pattern.png");">
  …
</div>

bernhardf-ro commented 4 years ago

We (the RealObjects PDFreactor development team) had a look at this issue and the new specification text: We agree with the general idea of this change. In fact, our current implementation already employs a similar approach. Also we have no objections to the new text. We are much in favor of the tokens "type", as it would grant significant additional possibilities, with no relevant drawbacks.

Loirooriol commented 4 years ago

I like tokens, and something similar could be useful for variables too, as I proposed in https://github.com/w3c/csswg-drafts/issues/2749#issuecomment-396043654

tabatkins commented 4 years ago

I ended up not putting tokens in yet (despite suggesting it as well) because it still requires a pass over the values to ensure there's no reference functions mixed in. I have an issue about whether or not to allow color functions and math functions as well, for the same reason; if we decide to allow them (after verifying there are no reference functions in them), then I'll pop in tokens as well.

xiaochengh commented 4 years ago

The latest spec seems very generous on the fallback value:

Otherwise, if the attr() function has a fallback value as its last argument, replace the attr() function by the fallback value. If there are any var() or attr() references in the fallback, substitute them as well.

This is different from the old CSS3 spec that, we no longer verify the fallback against the given type.

So I got a few questions:

Is this change intentional? In other words, do we still want to verify fallback against the given type? Btw, some developers may assume that we do verify the fallback, like we do that for registered custom properties.
(If yes for 1) When using the fallback, while the given type is a specific unit (e.g., px), do we want to parse the fallback as a number or a pixel value? For example, should we resolve attr(something-invalid px, 100) into 100 or 100px?

Note: There's an existing WPT expecting 100

(If yes for 1) When the attribute value itself is valid against the given type but the fallback isn't, is the attr() still valid at computed time? Again, some developers may assume no, similar to registered custom properties.

tabatkins commented 4 years ago

It's an intentional change. It lets you provide fallback of a different type entirely when the argument is missing. (Like width: attr(w px, auto);)
Fallback isn't parsed at all. The WPT looks like it's wrong? It will fail to parse the attribute (since "300px" isn't a number), then substitute in 200, which isn't a valid 'width' value in non-quirks pages, making the div iacvt so it reverts to "auto" and won't match the ref.
Yes, that's fine. Fallback isn't parsed by attr() at all.

xiaochengh commented 4 years ago

There's a compatibility issue found via WPT css/css-variables/variable-generated-content-dynamic-001.html

<style>
#parent { --x: attr(data-attr); }
#child::before { content: var(--x); }
</style>
<div id=parent><div id=child data-attr="foo"></div></div>

Before the spec change, we got a ::before pseudo-element on #child with "foo".

Now the resolution of attr() is similar to var(). So we first compute --x on #parent by substituting attr(), and get the empty string. Then in #child::before we inherit --x from its parent, and get an empty content.

In other words, we got a behavioral change even when the declaration is valid.

Besides, could you clarify the substitution value for types string and url when the attribute is missing? The current spec says:

For the fallback value: "...will be substituted instead of the attribute’s value if the attribute is missing or fails to parse as the specified type"
For the string and url types: "No value triggers fallback"

I suppose there is always no substitution value when the attribute is missing? So that the behavior is different from when the attribute value is the empty string.

tabatkins commented 4 years ago

Ah, my intention was that a lack of value is different from an empty value (and a lack of value always triggers fallback). I'll clarify the spec.

faceless2 commented 4 years ago

The latest draft has the attribute name defined as <wq-name>, which would make all of these valid:

attr(foo)
attr(m|foo)
attr(|foo)
attr(*|foo)

I don't think that last one was intentional, but if it was we need to define what it means with <element a:foo="1" b:foo="2">, as attribute order is not significant in XML.

yisibl commented 10 months ago

Can this be used with @property? When syntax:"<number>" is present, it is possible to parse the HTML attribute value as <number>?

@property --foo {
  syntax: "<number>";
  initial-value: 0;
  inherits: true;
 }

hober commented 9 months ago

@heycam pointed out back in 2019 something I think got lost in the subsequent conversation:

I do find it a little odd that the second argument to attr() names CSS syntax types and other operations like dimension value construction, and how e.g. string means take the literal string content and turn it into a <string> token, while number means to parse the attribute value as a <number> token. I'm half wondering whether it makes sense to align with the syntax used for custom property registration for things like the latter.

This is a great point. @property ... { syntax: ...; } and attr(... ...) should use the same syntax for referring to CSS syntax types.

tabatkins commented 1 week ago

These edits have now been performed, at https://drafts.csswg.org/css-values-5/#attr-notation, so I'm closing this issue.

w3c / csswg-drafts

[css-values-4] Switch advanced attr() to being var()-like #4482