Integrate value & selector parsing into PostCSS core.

ben-eb commented 8 years ago

So, one of my main concerns about writing cssnano version 4 is that I am trying to glue together lots of different parsers, each with different APIs, and I find myself wanting a much richer AST than what PostCSS currently provides. I feel that as per @thejameskyle's suggestion, we should look at parsing selectors and values;

I'm not sure if PostCSS wants to dive into creating an AST for all CSS values, as that adds a ton of complexity to what is currently extremely simple. Although, most modern compiler authors would [think that] this is the most sustainable solution.

https://github.com/thejameskyle/postcss-function-resolution

This means that we need to extend Declaration and Rule to provide a more detailed AST. I don't exactly know how this should look yet, so any suggestions are welcome. But James' rough draft looks good;

{
  type: Declaration,
  value: {
    type: Expression,
    expression: {
      type: CallExpression,
      callee: {
        type: Identifier,
        name: 'foo'
      },
      arguments: [{
        type: CallExpression,
        callee:{
          type: Identifier,
          name: 'bar'
        },
        arguments: [{
          ...
        }]
      }]
    }
  }
}

Furthermore, I'd like to drop node.raws.value.value and node.raws.value.raw from the AST completely and force plugin authors to provide support for comments or not in their plugins; the current behaviour is that PostCSS will hide comments automatically, which can cause problems with things like userstyles. This would be much easier when we have a unified AST that covers values and selectors as plugin authors can just ignore comments, and instead operate on relevant values.

I am concerned that this is a big change and is not something that I will consider lightly, so I'm looking for feedback. It makes writing complex plugins like cssnano & stylelint easier at the expense of smaller plugins, but I feel like this allows us to do some really sophisticated transforms/analysis in the future.

This issue supercedes https://github.com/postcss/postcss/issues/410, https://github.com/postcss/postcss/issues/235 & https://github.com/postcss/postcss/issues/178.

davidtheclark commented 8 years ago

I think that integrating selector and value parsers into Core would make a very positive difference. So many plugins either do or absolutely should use those parsers. Those that should but don't probably would use the parsers, and therefore improve their plugins, if they were integrated into Core. (Maybe they don't know the parsers exist, or maybe they don't know to trust the parsers, I don't know.)

In the tree I think we should make sure to use CSS verbiage instead of JS or more programming-language-y verbiage. Words that people will find when looking at MDN, the spec, and articles about CSS. For example, Expression I think is too technical. Could that not be Function? Just kind of a principle to keep in mind.

ben-eb commented 8 years ago

@davidtheclark Yeah, this language confused me at first too, which is why I am not sure about how we should begin to represent complex expressions in a simple way. So my understanding of an expression is that it encompasses anything from a simple value, to a list of values, or object dot syntax. So for example in babylon (Babel's parser), this is a MemberExpression;

'@'.charCodeAt(0);

Now, we may not use the exact same terminology for CSS grammar, but we need to think about appropriate structures for values. All of these types should be represented;

border: 1px solid red; /* space separated list of values */
font-family: Helvetica, Arial, sans-serif; /* comma separated list of values */
background-image: url(unicorn.jpg); /* function */
background: red url(unicorn.jpg) repeat-y; /* space separated list of values with a function */
content: 'foobar'; /* string literals */

Arguably each of these values would be expressions if we were using JS terminology. But if we are not to use expression, then what do we use? value? But each of the members inside the expression are values. So this can be confusing if we don't get it right, any suggestions are welcome. :smiley:

davidtheclark commented 8 years ago

At https://www.w3.org/TR/css3-values/, https://drafts.csswg.org/css-syntax-3/#component-value, and https://developer.mozilla.org/en-US/docs/Web/CSS/Value_definition_syntax I see regular use of "component value". Might we use Component or ComponentValue instead of Expression?

I'm not suggesting that we're going to find a super transparent word to regular CSS users, or that the term Expression is not clear enough when explained; just that I'd hope we could use words that are derived from the CSS spec rather than other languages.

ben-eb commented 8 years ago

I wasn't aware that this document existed; of course, we should aim to use words derived from the specification whenever possible. :+1:

jonathantneal commented 8 years ago

@thejameskyle’s suggestion seems like the only way to go forward for things like precss, too. Count me as a yes to this.

jamiebuilds commented 8 years ago

I just want to note that going down this path will also limit the flexibility people will have in custom syntax (which seems to be very common in PostCSS plugin). For example, if you parse selectors, then invalid character sequences will cause parse errors that don't exist today. It is possible to write parser plugins but that's much harder to support than code transforms.

You can see the specification for the Babel AST here. Obviously this does not map to CSS, but you might draw some inspiration from it, in particular the expression and literal types/sub-types. If you want to read some discussions about AST design, you can read through the ESTree Issues.

Another thing you might want to consider with this change is switching to a more formal visitor pattern. For example, this is what Babel's visitors look like:

path.traverse({
  Identifier(path) {
    // ...
  }
});

This pattern can be better optimized (for example Babel merges all the top level plugin visitors into one giant visitor so it only has to traverse the tree once), and in my experience leads to cleaner and less stateful transforms. A good example of this is the transforms I wrote for both Babel and JSCodeShift to transform pure react component classes into functions: Babel, JSCodeShift. These do almost the same exact thing and it was easy to add more functionality to the Babel transform.

Let me know if there's anything I can do to help this along.

ben-eb commented 8 years ago

@thejameskyle I'm not sure that I follow on the custom syntax thing; it is true that PostCSS plugins do define their own syntax sometimes, but I think that it is done within the limits of the CSS grammar as it is today. In cases where it isn't, for example single line comments, we have custom parsers that provide extensions to the default parser to support these use cases. Perhaps you could follow up with an example that illustrates your problem? :smile:

Similarly, https://github.com/postcss/postcss-selector-parser does parse selectors but makes no assumptions about for example pseudo selectors that it doesn't understand; e.g. you can easily define some made up function and it will still be parsed correctly:

a:unicorn(stoic) {

}

Is parsed the same as:

span:not(.stoic) {

}

jamiebuilds commented 8 years ago

Perhaps you could follow up with an example that illustrates your problem?

Say someone wanted to do interpolation of a variable into a selector:

.foo-#{$bar} { ... }

If you are parsing through the selector I imagine you will have all sorts of conflicts here.

ben-eb commented 8 years ago

OK sure; but right now, PostCSS doesn't support that kind of expression. It would require a custom parser to do this. :smile:

ai commented 8 years ago

BTW, we still need to have old API. So decl.value should be a string.

decl.valueTokens for tokens? Anyway most of plugins will use just event based API.

ai commented 8 years ago

@thejameskyle the good part of CSS is that we can parse any unknown syntaxes as word token. So .foo-#{$bar} can be parsed just as classSelector. So, plugin can still use regexp based syntax :D.

ben-eb commented 8 years ago

BTW, we still need to have old API. So decl.value should be a string.

Why? What if some plugin authors choose to update decl.value and others choose to update decl.valueTokens? How does PostCSS manage that plugin conflict? It's important to realise, in @davidtheclark's words;

So many plugins either do or absolutely should use those parsers.

I can't see how keeping this API around provides any benefit over more appropriate node parsing. It should be either/or, not both.

ai commented 8 years ago

@ben-eb any call of .value= setter will parse new value and update valueTokens.

jamiebuilds commented 8 years ago

The difficulty that you're going to have here is that a lot of these transforms are actually interpreting the code

foo(bar(baz($value)))

Resolving this requires plugins handing off responsibility and not running in any particular sequence. For example, if you have a plugin that resolves foo, it needs to wait for the bar plugin to resolve first, which needs to baz plugin to resolve, which needs the $variable plugin to resolve. These can happen in any order.

Now, a couple things need to happen to ensure that we don't have a ridiculous number of tree traversals or easy to break plugins:

Visitors that can be run on any part of the tree at any time

If I'm plugin foo and I can't resolve foo, we need to be able to run the rest of the plugins on that section of the AST somehow so that they can resolve it.

Visitors need to be able to visit nodes on "exit"

How does foo know it can't resolve what is inside of it? For example, if foo() accepts arbitrary text, how does it know that $value needs to be resolved as a variable by another plugin? (i.e. foo($value) => $value-fooified)

Instead, foo should run on "exit" (to understand what I mean by on exit read this). This way we know that the entire AST inside of foo(...) has been resolved.

When nodes change, they should trigger re-traversal

Imagine two plugins, one that resolves a subtract() function to a css calc() function, and one that resolves the calc() function to a normal value (i.e. subtract(2, 1) => calc(2 - 1) and calc(2 - 1) => 1).

When the subtract plugin runs, it replaces a node in the AST with new AST. We need to know to run the rest of the plugins on this AST. We need to trigger a re-traversal starting at calc()

Because of this, there needs to be an API around manipulating nodes instead of mutating them directly

If people are setting node.value directly, how does that trigger a re-traversal? You could use getters/setters or observe the tree directly, but for large ASTs that would be extremely inefficient and you lose the ability to do other types of operations like replacing one node with many and inserting a node before/after another.

What Babel does is wrap all nodes with node "paths" (please read this). When I call path.replaceWith([node]), I'm marking a given path as "changed" and telling the compiler to re-traverse everything inside it.

Sorry for the wall of dense text. The good news is that compilers are a solved problem in computer science (people have been doing pretty much the exact same thing for almost 20 years), the bad news is that there's a ton of stuff you need to know (I have three books on compilers each around 1000 pages long).

TL;DR: Parsing a more detailed AST is only half of the story, in order to resolve the original problem of function resolution (along with tons of other problems) the transformation API needs to change dramatically in order to solve the problem in a more maintainable way.

ai commented 8 years ago

@thejameskyle thanks for this big post. It is really useful

ben-eb commented 8 years ago

@thejameskyle @ai Agreed, looks like we have a lot to learn.

jonathantneal commented 8 years ago

That was really great. I love how @thejameskyle describes the problem so clearly, and then reassures us that it was solved years ago. Whew. Except now we gotta learn stuff. :)

ai commented 8 years ago

So we should implement #296 before this issue

andyjansson commented 8 years ago

I just wanted to interject with my opinion. As I see it, this type of API could be optional and invoked on-demand, saving us the need to alter the structure of the AST from how it is today.

I imagine something akin to the following:

...
var selectorAST = postcss.selector.parse(rule.selector);
// do something with selectorAST
rule.selector = postcss.selector.stringify(selectorAST)
...

This would address @thejameskyle's concern about custom syntax, as well as save us the need to parse each and every node in the tree.

ben-eb commented 8 years ago

I think we must parse each and every node. Currently we don't do this, which is what I find to be brittle and means that plugins must be run in an exact order; this is not a good developer experience.

andyjansson commented 8 years ago

I don't follow why you think parsing nodes (and by this I mean parsing a rule's selector value into an AST, for instance) is a necessity. I agree that the plugin order thing makes for a poor developer experience, but, as I see it, that issue is not rooted in the lack of an AST for node values, but rather that certain transformations need to have taken place before the plugin can successfully execute in accordance to what the user would expect. This is a big drawback in postcss' design. What's more, some issues can't be remedied by simply changing plugin order, such as when plugins depend on each-other's state.

However, both these issues would be addressed by executing the nodes one-by-one against the visitors, then re-executing a node against the visitors once it's been changed. Let me demonstrate using a couple of examples:

Example 1

   someprop: foo(bar(baz()));

Assume that foo() and baz() are defined by postcss-functions and bar() is defined by some other plugin. Currently this can't be successfully resolved because of a circular transformation dependency. Now, if we have visitors which are then re-applied whenever the node changes, this could be resolved.

So..

When the stylesheet is first processed, the visitors would run against the node. postcss-functions would process baz() but abstain from processing anything else because bar() is unrecognised. The other visitor would then execute and resolve bar(). This is normally when execution would stop and postcss would return the AST back to the user. Now, since the visitors have modified the node, we can imagine it being marked "dirty" and the visitors are executed once more. This time it looks something like this:

   someprop: foo(quux);

Since quux is a concrete value, postcss-functions can now resolve foo(). The other visitor will then be invoked but since bar() cannot be found, it will abstain from doing anything.

Example 2

Assuming postcss-simple-vars and postcss-conditionals, you cannot redefine a variable inside a condition because the whole AST is processed once by the first plugin, then once by the second plugin.

   // define $a and $b
   ...
   @if $b == 0 {
       $b: 1;
   }
   result: calc($a / $b);
   ...

In the above example it would always redefine $b to 1 due to plugin order (proof). Switching the plugins around would cause an error currently because postcss-conditionals does not recognise $b as a valid value, but nevermind about that for now.

Now, if we had visitors, postcss would walk through the node tree one node at a time and run it through the visitors. First postcss-simple-vars would define $a and $b and postcss-conditionals would abstain from doing anything because it hasn't seen an @if yet. Eventually we would come across the @if and postcss-simple-vars would replace $b with its corresponding value. Then postcss-conditionals would execute and evaluate the expression causing it to either remove the branch completely, or replace it with its contents. Let's assume that the expression is truthy and the @if is indeed replaced by its contents. Postcss would then move on to the next node and pass $b: 1; to the visitors causing the $b variable to be redefined. Once postcss reaches the result: calc($a / $b); declaration it would be resolved correctly.

Assuming the opposite plugin order, I imagine postcss-conditionals not to throw a syntax error upon seeing $b and instead ignore the @if altogether. Since postcss-simple-vars would then replace $b with its corresponding value, the node would be marked dirty and the visitors would be invoked all over again, making the whole thing work as expected.

PS. I don't think your AST-for-node-values idea would work with postcss-conditionals, which is why making it optional, still allowing for this kind of syntax, would be a good idea.

PPS. Since the above example wouldn't error due to the assumption that the @if would eventually take a recognisable form, we would need to discuss when we actually can give off errors. I would imagine the errors needing to be bound to the node, cleared for as long as the node is marked dirty, and only being emitted upon exit..?

Sorry for the wall of text.

ben-eb commented 8 years ago

I don't follow why you think parsing nodes (and by this I mean parsing a rule's selector value into an AST, for instance) is a necessity.

It's a necessity because in order to facilitate two ways of writing plugins (querying a node's value tree or querying a node's stringified value), PostCSS needs to maintain a string representation of a node's value (for instance) as well as a node tree. So that whichever one is set will update the other, in order to satisfy a plugin's expectations. That seems really inefficient to me as PostCSS will have to do this for every plugin.

The rest of your post I believe is how Babel works; calling a Path method such as replaceWith will cause a requeue of affected nodes so that plugins have multiple chances of running in order to properly satisfy function resolution.

I'm not sure about errors. Perhaps they should be handled after exiting the Root node?

andyjansson commented 8 years ago

It's a necessity because in order to facilitate two ways of writing plugins (querying a node's value tree or querying a node's stringified value)

And the previous proposed solution to that doesn't satisfy that need?

var ast = postcss.selector.parse(rule.selector);

jamiebuilds commented 8 years ago

@andyjansson I think you aren't understanding the problem completely.

Say we have this selector:

fn1(fn2(fn1(foo))) { ... }

Then I have plugins for fn1 and fn2

Solution 1: Run in order

If we ran fn1 and then fn2, we would run into the issue of fn1() trying to resolve "fn2(fn1(foo))" which it obviously cant because it does not and should not have any knowledge of fn2().

Solution 2: Repeatedly try to resolve

Here we keep running fn1 and fn2 until the selector has been completely resolved. fn1 and fn2 would need to know both how to resolve it and if they can actually resolve it. If they can't they would have to tell postcss to run everything else and come back later. Here you run into the problem of not knowing the difference between the inability to resolve a value because its invalid or because it needs to run something else, and on a syntax error postcss would be running forever.

Solution 3: Have a tree structure that can be folded

Here we parse the entire AST and do a depth-first traversal, resolving dependencies upwards. This solves the issue of ordering and the issue of plugins trying to figure out if they can't resolve a value because of a syntax error or because of another plugin that needs to run.

Solution 3 is also the fastest and easiest to reason about solution, which is why it is used in every modern compiler. What you suggest simply wouldn't work.

andyjansson commented 8 years ago

@thejameskyle what you just described sounds exactly what I've been talking about...?

Here you run into the problem of not knowing the difference between the inability to resolve a value because its invalid or because it needs to run something else

I addressed this concern at the bottom of my post

Here we parse the entire AST and do a depth-first traversal, resolving dependencies upwards.

Yes, I understand that, but this also means that we cannot use custom syntax. In case of, say, postcss-functions, it does resolve depth first.

and on a syntax error postcss would be running forever.

When I said syntax error earlier, I didn't mean a CSS syntax error. I was referring to a syntax error pertaining to the custom syntax of the if statement.

What you suggest simply wouldn't work.

I've laid out my reasoning and for the examples I've presented, it does work. Care to provide an example where it wouldn't?

I'm not trying to be dismissive of what you're saying, it's just that there seem to be some information missing here.

ben-eb commented 8 years ago

And the previous proposed solution to that doesn't satisfy that need?

Well.. it does and doesn't. For argument's sake, let's have 20 plugins in the same PostCSS instance that use var ast = postcss.selector.parse(rule.selector);. Now, we have a bottleneck because 20 plugins need to parse those selectors and then stringify them. This is what we currently are doing (via postcss-selector-parser), which is inefficient because you have 20 parsing steps and 20 stringifying steps for the same selector. Whereas, if we parse all of these we will get a slowdown on initial parse, but it will be made up with a richer AST. More to the point though, we have a unified API which is much more preferable in my eyes than 3 different ones (postcss, postcss-selector-parser, postcss-value-parser).

So for example with a unified API we can iterate through all comment nodes whether they are in selectors or values, and then remove them:

{
  comment: {
    enter: path => path.remove()
  }
}

This is much harder to do at the moment because a developer has to negotiate three different parsers.

andyjansson commented 8 years ago

Now, we have a bottleneck because 20 plugins need to parse those selectors and then stringify them.

Yep, I totally get that.

I'm totally for an AST provided that it doesn't hamper the ability to write values which doesn't follow conventional CSS value syntax (such as the aforementioned postcss-conditionals plugin). I do not see that it would be possible if the AST is the mandatory form of mutation.

If however there are two properties, one for string manipulation and the other for AST manipulation, one compiling its value to the other, I could see it working. It would however necitate that the AST property can be null (since you can write values which cannot be parsed).

jamiebuilds commented 8 years ago

If this is all about being able to parse custom syntax there are ways of making that happen without abandoning a detailed AST. CSS is a pretty straightforward syntax to parse, and there are well documented ways of extending parsers to understand more syntax.

But as I believe @ai mentioned before, most of the custom syntax people want doesn't conflict with the syntax that would be parsed by PostCSS.

clentfort commented 8 years ago

FYI: I'm currently working on a parser for CSS Selectors with support for Level 4 Selectors as defined here. It's not yet complete (traversing and manipulating the AST is missing), but if you are interested in integrating the parser into PostCSS I'd be happy to chat (and design/adjust the API according to suggestions). Otherwise I'd be happy if we could have an active discussion about the AST for Selectors, so I can steer my parser into the same direction (and maybe use it as a drop-in replacement at some point).

ai commented 8 years ago

Yeap, let's talk in PostCSS gitter after May 12 (sorry, conference).

What do you think about this steps:

Creating a separated library.
Stabilize API
Move API to PostCSS core.

Also, what will be the main difference with postcss-selector-parser?

clentfort commented 8 years ago

Well one major advantage so far is, that it supports plugins for the parser. This allows correct parsing of :not(SimpleSelector, SimpleSelector, ..) and :nth-child(2n+1 of SelectorList), which postcss-selector-parser has problems with. (It actually parses nested not-selectors, even though that is not allowed according to the official spec). The parser is also faster on many cases (at least on my machine), you can test yourself by running the benchmarks in my repository. (Simply start them with babel-node, I didn't have the time yet to setup scripts in the package.json).

ai commented 8 years ago

@clentfort awesome! @ben-eb what do you think?

ben-eb commented 8 years ago

Do we want to go down this road with PostCSS? We could make PostCSS only parse 'legal' grammars, but that is slower than "loosely" parsing as we are doing right now. If anyone wanted some custom syntax it would be harder to implement than it is currently.

For my use case I don't care about the semantics of tokens inside pseudo-selectors, which is why it works like it does. I don't think it can be that big of a deal given that it's been an open issue since last July. I guess tagging things 'help wanted' doesn't seem to make much of difference.

clentfort commented 8 years ago

As I said, my selectors parses nested not-callst, even though strictly speaking it is not a valid CSS selector. I really can't see any disadvantage by parsing the parameters of a CallExpression, with the design of my parser it's pretty trivial to define specific rules for different calls (right now I have, to prototype the API, support for :not and :nth-child). It give tools such as Stylelint powerful tools to fully inspect selectors, i.e it could correctly calculate the specificity of selectors.

I wonder about your concerns about the speed, I already mentioned that I benchmarked my parser against postcss-selector-parser and even though the tokenizing is slower (based on the more yieldy design), the overall parsing is faster for many inputs (also on inputs where another-selector-parser parses subtress that are ignored by postcss-selector-parser such as :nth-child(An+b of SelectorList))! Please check for yourself, the benchmarks are included in my repository! (Also please verify I'm instrumenting postcss-selector-parser correctly!)

Regarding custom grammars: Well this is as much a problem with postcss-selector-parser, it needs to be extended to generated a correct AST, same goes for my parser. Sure I haven't added stuff like SASS Interpolation or Nesting operators, but those again shouldn't be too hard to add. For my use case I don't care about these.

hokkaido commented 7 years ago

Is this still something that is being considered / worked on?

ai commented 7 years ago

Yeap, this issue is in process inside "common parse for Sasd, CSS and Less" project. But it will not be solved soon.

hokkaido commented 7 years ago

Thanks for the answer. Is there any way to help with the effort? Would love to see this feature land in postcss.

ai commented 7 years ago

@hokkaido not right now. My current plans:

Finish Logux 0.2
@lahmatiy will move tokenizer to separated project.
We will replace tokenizer to better from @lahmatiy.
After testing we will replace parser to more csstree-ish and put values/selectors tokens to separated properties.
Thinking about better AST API (with events/visitor?).

So it is hard to make it faster. What is your current problem? Maybe I can suggest faster solution?

ai commented 6 years ago

Close for https://github.com/postcss/postcss/issues/1145

postcss / postcss

Integrate value & selector parsing into PostCSS core. #754