Open alexander-akait opened 1 year ago
@alexander-akait Thanks for raising this parser discussion! This is an exciting topic.
So, do you think this issue covers stylelint/stylelint#5586?
I think also @csstools/*
tools like @csstools/media-query-list-parser
or @csstools/css-tokenizer
by @romainmenke are great achievements in this area.
We have
postcss
and we have certain problems/issues, which, unfortunately, have not been resolved for a long time, like CSS compliance tokenizer and parser, selectors, at-rules and value parser
Does PostCSS's author recognize these problems? And can we not add parser improvements upstream to PostCSS?
Because Stylelint is currently a part of the PostCSS's ecosystem, I think PostCSS would be best for backward compatibility if PostCSS accepted requests for parser improvements.
Thank you for bringing this up and for gathering all this info 🙇
I think performance of PostCSS and the CSS tooling ecosystem build around PostCSS is a complicated subject :)
On the one hand PostCSS itself is really really fast. But the way it is designed means that there will be a lot of duplicate work on selectors, values, at-rule preludes, ...
It also is written in JS, so it's bound by the constraints of JS engines. There are a lot of factors here, but in short, it will never be Rust.
Imho it isn't realistically possible to create a new parser that solves everything :
less
, scss
, css-in-js
, ...)If the constraint that is most important is performance, then it makes more sense to me that Rust or something similar is chosen as a starting point and that other aspects are sacrificed.
However, what I consider to be the most valuable part of PostCSS is not performance but the community and existing adoption.
There is a very large chance that people already have PostCSS as part of their stack, so the barrier to adding a tool based on PostCSS is low.
There is also very little friction within the active PostCSS community, a lot of people are open to collaborate towards a common goal. ( like this here :) )
Starting a community from scratch around a new toolset is not something I am personally interested in :) Not that what I am or am not interested in should stop anyone
Does PostCSS's author recognize these problems?
Yes : https://github.com/postcss/postcss/issues/1145
It's a known issue that it is a waste that each plugin needs to parse values, selectors, at-rule preludes over and over again.
And can we not add parser improvements upstream to PostCSS?
I tried and "succeeded" : https://github.com/postcss/postcss/pull/1812
I wrote more about why I advised not merge that : https://github.com/postcss/postcss/issues/1145#issuecomment-1397999466
TL;DR; the cost of rolling out that change was too high. It would have fractured the PostCSS ecosystem in a way that it might not recover from.
But it would have made it possible for multiple consumers to parse from an existing token array instead of starting from a string. For a tool that is mostly read heavy like Stylelint it would have meant a serious performance gain.
PostCSS as a host/driver for plugins just works really well and the reason that it is successful is also the reason why we have a performance issue. By hiding a lot of complexity and only exposing a limited Object Model it is much easier to create a simple plugin. But it becomes harder to have a performant "tool chain".
My current approach in postcss-preset-env
for better performance is to try and guard each parsing of values, selectors, ... with a small test.
If I want to create a fallback for the ic
unit, first check if ic
exists as a substring of the value. It might be part of a word like pick
and that is fine, but most of the values would be skipped without further parsing.
We might be able to do similar things in Stylelint?
Best to discuss that in it's own issue.
Something that I haven't tried yet, but that I think could work is to cache parsed values.
Each time you take something out of the cache the entry is removed. If you didn't mutate or produced more useful parsed values you add them back to the cache.
This would be extremely sensitive to bugs and any bug would be hard to fix. But it would allow read heavy tasks to share work.
Also best to discuss this in it's own issue.
My current goal with packages like @csstools/css-tokenizer
is to lower the barrier to creating high quality parsers for CSS. I want people to have great tooling, even for modern syntax. I don't want there to be a gap of years between a feature landing in browsers and tooling to catch up.
Because it is unopinionated, follows the CSS specification, and doesn't support non-standard syntax it is also really stable. Either it implements the specification correctly or it has a bug and a bug can always be fixed in a patch release. (we might still do semver major from time to time, but these should be rare)
Many things can also be done at the tokenizer level:
On top of the tokenizer there are the parser algorithms. They are currently limited and only implement the basics for component values. Ideally we extend these to cover more of the css syntax.
These allow you to do more, because structures like blocks, functions are fully parsed. But there isn't any Object Model specific to your context.
To actually have a useful Object Model another layer is needed, specialized parser which are only invoked when relevant. Things like the media query list parser.
This has a complete Object Model but that is also what makes it massive. There are so many node types in this sub-syntax alone.
We need to think about non CSS compliance syntaxes (like sass/less/etc) and how to design it extendable, but, yeah, we can start with only CSS and improve it late, CSS by default is error tolerance, so everyhting what we can't parser will be
ListOfComponentValues
I don't personally use non-standard CSS syntax, everything is plain CSS in a file that has a .css
extension. (I am not a frontend developer, so more accurate to say that the team I work in writes plain CSS)
My main reason not to support these is because they do not have a true standards body behind them and that I lack familiarity with these syntaxes.
Correctly following one specification is difficult enough. Also adding support for several syntaxes that do not even have a specification is not something I want to spend my time on.
But having said that, all tools I've created are composable and modular.
You can rewrite the token stream to make scss
look like css
, or have different parsing algorithms and then pass on the result of that to one of the specialized parsers like for media queries.
I want people to be able to re-use the complex and hard parts.
Some questions :
[^1]: Faster is always better, but at some point people don't notice the gains anymore.
@romainmenke Thanks for sharing postcss/postcss#1145. Now I understand the context very well. 👍🏼
@romainmenke I'll try answering your questions as far as I know:
are there specific parts of CSS that lack detailed parsers and that this lack of a parser is blocking specific features?
I don't remember completely, but this project may have some blockers due to insufficient parser libraries.
has anyone done any research on what is fast enough for specific tools (minifiers, bundlers, linters, ...)
Unfortunately, I don't know.
I've tried listing up parser libraries used by Stylelint. Some have almost not maintained 😓
Name | Version | Last published | Unpacked size |
---|---|---|---|
postcss |
8.4.23 | Apr 20, 2023 | 194KB |
postcss-media-query-parser |
0.2.3 | Oct 27, 2016 | n/a |
postcss-resolve-nested-selector |
0.1.1 | Feb 19, 2016 | n/a |
postcss-safe-parser |
6.0.0 | Jun 14, 2021 | 5.2KB |
postcss-selector-parser |
6.0.13 | May 16, 2023 | 186KB |
postcss-value-parser |
4.2.0 | Nov 29, 2021 | 27KB |
@csstools/css-parser-algorithms |
2.1.1 | Apr 10, 2023 | 31KB |
@csstools/css-tokenizer |
2.1.1 | Apr 10, 2023 | 59KB |
@csstools/media-query-list-parser |
2.0.4 | Apr 10, 2023 | 122KB |
@csstools/selector-specificity |
2.2.0 | Mar 21, 2023 | 17KB |
css-tree |
2.3.1 | Dec 15, 2022 | 1.2MB |
```js import { spawnSync } from 'child_process'; const allDeps = JSON.parse( spawnSync('npm', ['view', '--json', 'stylelint@15.6.2', 'dependencies']).stdout.toString(), ); const parserDeps = [ 'postcss', 'postcss-media-query-parser', 'postcss-resolve-nested-selector', 'postcss-safe-parser', 'postcss-selector-parser', 'postcss-value-parser', '@csstools/css-parser-algorithms', '@csstools/css-tokenizer', '@csstools/media-query-list-parser', '@csstools/selector-specificity', 'css-tree', ]; const dateFormat = new Intl.DateTimeFormat('en', { dateStyle: 'medium' }); const sizeFormat = new Intl.NumberFormat('en', { notation: 'compact' }); console.log(`| Name | Version | Last published | Unpacked size |`); console.log(`|:-----|:--------|:---------------|---------------:|`); for (const name of parserDeps) { const version = allDeps[name]; if (!version) { throw new Error(`${name} is not in dependencies`); } let dep = JSON.parse( spawnSync('npm', ['view', '--json', `${name}@${version}`]).stdout.toString(), ); if (Array.isArray(dep)) { dep = dep.at(-1); } const lastPublished = dateFormat.format(new Date(dep.time[dep.version])); const size = dep.dist.unpackedSize ? sizeFormat.format(dep.dist.unpackedSize) + 'B' : 'n/a'; console.log( `| [\`${name}\`](https://www.npmjs.com/package/${name}) | ${dep.version} | ${lastPublished} | ${size} |`, ); } ```
EDIT: This list is at point of Stylelint 15.6.2
Problems with dependent parsers:
css-tree
is large and may be going to be unmaintainedOf that list only these seem immediately problematic to me :
postcss-resolve-nested-selector
postcss-media-query-parser
They have not been updated even when the CSS specifications that are relevant to them have changed years ago.
postcss-value-parser
has a few open issues which are hard to fix but these are edge cases, not entire unsupported features. Maybe this one can be handled more on a case by case basis?
css-tree
is hard for me to judge the situation. It might be a temporary gap in between active maintenance?
Would be good to reach out.
I really like the syntax checking it offers and it's not trivial to re-create this feature.
Oh, there are a lot of messages
Imho it isn't realistically possible to create a new parser that solves everything :
equally fast or faster than PostCSS can support non-standard syntaxes (less, scss, css-in-js, ...) can support an ecosystem of plugins will still be fast when a lot of plugins are run can parse CSS in it's entirety has a user friendly API surface has a complete and correct Object Model is written in JavaScript
I full disagree:
ListOfComponentsValues
for declaraiotns, at-rules, selectors and etcComment
Node, working with commnets in postcss is the hell, we have around 9k hacks to make it works (and ability to get their content) and look at babel/acorn comments implementations, no commnets in AST, you can easy undestand trailing and remains commentsBy default CSS tokenizer is error resistance (and CSS parser) too, so we don't need to worry a lot of non standard CSS, because by spec it will be ListOfCompomentsValues if we can't apply grammar.
If the constraint that is most important is performance, then it makes more sense to me that Rust or something similar is chosen as a starting point and that other aspects are sacrificed. However, what I consider to be the most valuable part of PostCSS is not performance but the community and existing adoption. There is a very large chance that people already have PostCSS as part of their stack, so the barrier to adding a tool based on PostCSS is low. There is also very little friction within the active PostCSS community, a lot of people are open to collaborate towards a common goal. ( like this here :) ) Starting a community from scratch around a new toolset is not something I am personally interested in :) Not that what I am or am not interested in should stop anyone
I propose not to parry to emotion, but to return to reality, if the tool is not going to solve problems and does not provide an opportunity to solve them, then it's time to change the tool.
Some questions :
are there specific parts of CSS that lack detailed parsers and that this lack of a parser is blocking specific features? has anyone done any research on what is fast enough for specific tools (minifiers, bundlers, linters, ...) 1
Yes and Yes, But we just have incredible performance issues and bugs
Now let's get back to being more constructive:
/* i-need-to-ignore-the-next-line */
(it can be in any place), you need do magic thingsstylelint
team, @csstools
and other teamsThat is why I suggest to follow the steps:
postcss-new-parser
(maybe better name) where we will generate PostCSS AST bug using our tokenizer and parserSome steps can be split into several, I am fine with it, I would also like to add - I've spent quite a bit of time on a lot of tools and parsers in the postcss ecosystem, and I'm honestly tired, and perhaps this is my last attempt to somehow consolidate all this, if it fails again, I will be upset too much again, ultimately, this will lead to the fact that we will simply lose most of our community in the near future
@alexander-akait What a big challenge! 👍🏼 👍🏼 👍🏼
I totally agree with the JS solution against Rust since there is a big JS/CSS community here.
Additionally, I agree with starting with a CSS tokenizer and value/at-rule/selector/etc parsers. We will be able to try them in the Stylelint codebase easily.
By performance, I don't mean that we should have speed like C++ or Rust, it should be acceptable, here is a clear example of the problem - cssnano has around 14-16 plugins under the hood and in almost every we parse selectors and values, same here in stylelint, if this is not a clear performance problem, then I immediately give up
Yeah, the performance issue is absolutely clear, I know it very well :) But my point was more that I don't think users of PostCSS see/experience this problem.
LightningCSS for example is (on the surface) a combo of :
Even when being so much faster, people aren't really that interested, they think it is very cool, but very few are switching to it. The cost of switching tools is higher than the cost of waiting a few 100ms, even if 90% of that time is useless re-parsing.
I've spent quite a bit of time on a lot of tools and parsers in the postcss ecosystem, and I'm honestly tired, and perhaps this is my last attempt to somehow consolidate all this, if it fails again, I will be upset too much again
I can understand this, and I feel this too, but this is also exactly why I am hesitant.
How can we do a project like this sustainably?
The tokenizer is not something we have to start all over right? Is there a reason we can not use our existing tokenizer?
https://github.com/csstools/postcss-plugins/tree/main/packages/css-tokenizer#readme
How can we do a project like this sustainably?
Yes, this is really a headache for us. 😓 But at least, I believe we can provide a place where the Stylelint community members can easily join.
Is there a reason we can not use our existing tokenizer?
Personally, I think @csstools/css-tokenizer
is a great starting point.
Would https://github.com/servo/rust-cssparser be suitable to integrate? It's the CSS parser that Firefox uses. Thought its docs do indicate it does not parse into selectors or properties, so it's probably only half a parser.
We need to think about non CSS compliance syntaxes (like sass/less/etc) and how to design it extendable, but, yeah, we can start with only CSS and improve it late, CSS by default is error tolerance, so everyhting what we can't parser will be ListOfComponentValues
CSS preprocessors are on their way out with CSS now having variables, nesting and color modification. I see no compelling reason anymore to use them.
Would servo/rust-cssparser be suitable to integrate?
It's interesting. But I believe our community may be hard to maintain the Rust code.
CSS preprocessors are on their way out with CSS now having variables, nesting and color modification. I see no compelling reason anymore to use them.
I think it's important to keep backward compatibility and extendability for CSS-like syntaxes (Sass/Less etc.) because there are big communities already. At least, we should allow anyone to extend and customize our new parser for such syntaxes.
I think it's important to keep backward compatibility and extendability for CSS-like syntaxes (Sass/Less etc.) because there are big communities already. At least, we should allow anyone to extend and customize our new parser for such syntaxes.
One way of supporting preprocessors would be to transpile the Sass/Less code with source maps to CSS, lint the CSS, and then report back the errors with the position obtained through the source map. Maybe this is already how it works with the existing customSyntax
option, not sure.
I personally don't think it's a good idea to rely on using source maps. I think autocorrection breaks syntax in most cases.
Right, --fix
would not work via such a sourcemap transformation I assume.
Do we have a flamegraph of node_modules/.bin/jest --runInBand
?
In short we need some metrics/profiling first.
https://github.com/stylelint/stylelint/blob/main/lib/rules/color-named/index.js#L63-L128
color-named
is a good example of a performance issue.
It is eagerly parsing with declaration values with postcss-value-parsers
without a fast abort.
It is then walking the value AST and again eagerly parsing with colord
.
We also have a color value parser built on top of our tokenizer and parser algorithms : https://github.com/csstools/postcss-plugins/tree/main/packages/css-color-parser#readme
The input to this specialized parser is not a string but component values. So there isn't any expensive serializing and re-parsing to make tools work together.
As many logic as possible can be done first at the token level, than at component values and only when really needed as fully parsed color values.
Each step only does the minimal amount of work.
@romainmenke
The tokenizer is not something we have to start all over right? Is there a reason we can not use our existing tokenizer?
https://github.com/csstools/postcss-plugins/tree/main/packages/css-tokenizer#readme
I am fine with it.
My suggestions are:
csstools
org, just to avoid mixing postcss-plugins
works and parser
worksimport
/require
degrades start time (it's pretty obvious for parser, on each file Node.js execute fs calls, they cost a time)Maybe I missed something else but this is not a problem, we can discuss it in the repository if we can all agree
move it to own repo, we can still be under
csstools
org, just to avoid mixing postcss-plugins works and parser works
I don't have ownership, admin or publish permissions for either the github org or the npm org for csstools
. Either that needs to change and must be extended at least to you (@alexander-akait) or a different space must be created for this effort.
It might be better to do a clean slate start. (We can transfer existing code, test suites, ...)
I personally prefer to work in a mono repo because that makes it easier to spot regressions. Are you ok with having a single git repository for all tokenizer, parser related work?
I agree on all points of feedback related to the current tokenizer.
@alexander-akait @romainmenke If you wish, providing repositories for parsers etc. under the github.com/stylelint
org may be possible.
@stylelint/owners Any thoughts?
If you wish, providing repositories for parsers etc. under the github.com/stylelint org may be possible. No objections to hosting under the github.com/stylelint org
Would servo/rust-cssparser be suitable to integrate?
It's interesting. But I believe our community may be hard to maintain the Rust code.
This is something to be aware of, historically Stylelint has had difficulty in attracting contributors at various times, it's been at times quite challenging allowing both Stylelint to be extended by other plugins and Stylelint depending on other packages and having this ecosystem maintained
Another consideration is the https://github.com/eslint/rfcs/pull/99
This RFC specifies a plugin format that would allow ESLint plugins to fully define their own languages, effectively expanding ESLint from a JavaScript-focused linter into a more general-purpose linter.
The goal here is to take the boring parts of a linter (file finding, configuration, etc.) and separate that out from the JS-specific parts so no one needs to rebuild the boring parts over and over again.
I've not fully thought through all of this, though if writing new tokenizer/parser and having ESLint under the hood to simplify & streamline the maintenance of the underlying cli and api aspects of Stylelint is worth thinking about also IMHO
@ntwb Thanks for the comment. As you mentioned, Stylelint has needed more maintainers.
I personally think this @alexander-akait's suggestion is great not only for the Stylelint community but also for other JS/CSS communities. However, unfortunately, supporting the challenge under the Stylelint organization may be risky because of that maintainer shortage. 😓
@romainmenke
I personally prefer to work in a mono repo because that makes it easier to spot regressions. Are you ok with having a single git repository for all tokenizer, parser related work?
Yes, of course, tokenizer/parser/traverser/serializer, these are things related to the parser process, so it would be great to have them all in one place.
@ntwb
Another consideration is the https://github.com/eslint/rfcs/pull/99 I've not fully thought through all of this, though if writing new tokenizer/parser and having ESLint under the hood to simplify & streamline the maintenance of the underlying cli and api aspects of Stylelint is worth thinking about also IMHO
It's so funny, because I offered to do this 5 years ago, when we were just starting work, but was refused everywhere, now it's official.
And I proceeded from a simple thing - we should make the core for any linters. CLI logic/rules logic/configuration(s)/ignoring and extending/options for parsers and rules/fixable logic/etc and we had to duplicate all this. And my logic was that we could avoid this, collaborate and combine the work, and now I see how it all came to this. But unfortunately a little late and our code has become more complicated and now it would be quite difficult to rewrite all this (yeah, we can just create a rule and run stylelint inside that rule, but that looks like a big mono and badly configurable rule).
But now we can avoid some mistakes too
JS has https://github.com/estree/estree, so any parsers which follow estree are compatibility and I think we have to do the same, yes it takes a time and I definitely can't do it alone, BUT if we do this, then we will become independent of the parser and its implementation in the future, Rust/JS/Zip/C++/C, whatever you want. I still think that the idea of rewriting everything in Rust is a utopia at this moment (the future is foggy and we do not know what will happen tomorrow, but we can influence it), yes it would be great and it would allow for us to have good perf and many and many, but if we look at the world realistically, we will understand that, unfortunately, there are not so many people who know it, and most our users know only JS (some TS too). But this does not mean that we should not build the right foundation, if we get to this in time, then it will be fine, but for now we can just agree on some documents for AST structures and maybe basic API.
Hey I just want to introduce myself. I'm working on a shared parser/linter/formatter core, and it is my explicit goal (and full-time job) to unify what can be unified across this ecosystem. I believe myself to be several (important) steps ahead of ESLint in this regard, and as they have also shown me nothing but indifference it seems that I am their open competitor. My project is still flying under the radar for the moment, but I plan for that to change in a major way, and soon.
Might be an interesting read : https://railsatscale.com//2023-06-12-rewriting-the-ruby-parser/
Thanks for sharing the article. I read it. We wish "Universal Parser" for CSS, too!
The best CSS parser ought the be the one that browsers use. I wonder if Blink's CSS parser could be leveraged 😆.
The best CSS parser ought the be the one that browsers use.
Yes and no :)
They are the best because they are extremely well tested and are used in the wild by billions.
But browsers only need to parse CSS for a limited use case. Their parsers don't have to preserve as much debug info (like whitespace or comments).
Those parsers also don't have to support non-standard syntax like scss, less, ....
LightingCSS for example uses Servo's CSS tokenizer/parser and that is what makes it good and extremely fast. But it's also the source of all the limitations of LightingCSS.
LightingCSS can not be used to build a linter because it discards too many tokens.
LightingCSS can not be used to build a linter because it discards too many tokens.
This is where I come in! cst-tokens takes the output of an existing parser and uses it to rebuild a tree in which every source character is present in the token stream. Doing this requires defining the syntax of CSS in a cst-tokens parser grammar, but the parser need not be complete: it does not need to know how to resolve ambiguity. The traversal code simply uses the output of the first-pass parser for that purpose. In this way my project's functionality is closely related to that of ungrammar (which you should also look into though I am focused on extensible grammars and they are not).
The cst-tokens CST is also a pure superset of the AST it decorates, and is meant to have all the APIs needed to build any kind of parser, formatter, and linter functionality. It allows comment attachment rules for ambiguous comments to be well-defined, while always preserving the ability to see all possible comment attachments for any given node.
Another reason there's a strong case for a concrete syntax wrapper around an existing AST is that you don't really have to risk breaking anything!! You use the same parser -- you're just adding a new validator and retokenizer layer, so for your users AND your lint rules the language is guaranteed not to have changed at all!
The downside is that the technology isn't ready for production usage yet, and won't be for a little while. Serious users will want to see the library hit 1.0.0, a goal which I've ensured that I can reach and am working directly towards.
I'm essentially here asking for help doing the work that makes everything I am describing possible. With the right help I could get to 1.0.0 a lot faster!
I think it's important to find a place for this effort so that we can split this thread.
I don't want to engage too much on specifics but I also don't want to appear dismissive of people reaching out like @conartist6 .
I think many people care about this issue and want to collaborate.
Maybe any new repository is fine? It only needs to serve as a temporary home for discussion and issues.
A place where we can align on priorities, goals, ...
I can provide a new repository in the github.com/stylelint org, which would be a temporary home for our collaboration. It also would work until we would find a more appropriate home (org).
For example, how about github.com/stylelint/css-parser
? I can invite a few people as the repository owner at first.
I'm also interested in this project, and as my time allows, I am happy to help with the planning / implementation. Are you planning to create a Discord server or similar communication platform?
I like the idea of CST, but unfortunately the use of generic solutions is often much worse in performance due overhead (but I would look at the benches), original CSS tokens (from the syntax spec) already have everything - whitespaces/tokens/etc. Also it is good to be align with it for maintance purposes.
If someone wants to start that would be great, I'm a little busy right now. And yes anyway we need to start with the tokenizer and we already have a solutions (we can reuse them).
@alexander-akait @romainmenke I've created a repository for this project and invited you as an admin. https://github.com/stylelint/css-parser
Please freely use it. Since the repo may be temporary, you don't need to follow the Stylelint organization rules.
Are you planning to create a Discord server or similar communication platform?
@scripthunter7 We have no plan at this point, but it's possible to consider it if such a platform is required. I want to leave its decision up to the admins.
Thank you @ybiquitous,
I will try to get the ball rolling in a few issues in the next few weeks.
I recently saw @keithamus's csslex, maybe it is something to consider using.
Thank you for sharing this @silverwind That package looks really great!
I've started a list of tokenizers here : https://github.com/stylelint/css-parser/issues/1
@romainmenke You can transfer this issue to stylelint/css-parser if you wish it. Of course, no problem with as-is. 👍🏼
I'm still working on my solution. It won't be fast in the way Rust is zoom-zoom close-to-the-metal fast, but it will be incremental, streaming, extensible, and easy to maintain -- properties that should prove highly advantageous to linters. Right now I'm working on defining an XML-based serialization format that allows my disambiguated trees to be easily sent over a wire. It's a fun example to check out because it both defines the syntax and shows how the parser core works to define syntaxes. https://gist.github.com/conartist6/5adbbf28d11497467848f530756c1c2a
As for the zoom-zoom part, making that method of defining syntax fast is mostly just a matter of doing some code transformation. For example if you have a production like this:
export const productions = {
*Identifier() {
yield eat(tok`Identifier`);
}
}
There's a bunch of associated cost from evaluating eat(tok`Identifier`)
repeatedly. But I could eliminate that cost using a hoisting transform that would change the code to something like
const hoisted_1 = eat(tok`Identifier`);
export const productions = {
*Identifier() {
yield hoisted_1;
}
}
Now you can see that there's actually a pretty small amount of logic necessary to process any given production!
What you gain for your effort is the ability to process chunked streams. You don't need to have the entire source in a single stream, as many parsers require so that they can store indexes into the string as state.
For a linter this means gaining the ability to lint files larger than fit in memory. Memory usage would be driven more by the complexity of language and query rules than by the size of the file being linted.
Also tokens that index into strings tend to perform badly when you want to insert a token. The structure requires invalidating all other tokens because the indexes of all tokens after the change will need to be updated by some offset.
related: biomejs/biome#268
Just idea for future and future discussions, maybe we can union and write full featured CSS parser + at-rules/values parser from scratch, I am afraid we can't rewrite postcss due some specific logic (and it will probably take longer), so union around CSS parser will be great for any JS tooling, we can open an issue for this
Shorty about situation:
postcss
and we have certain problems/issues, which, unfortunately, have not been resolved for a long time, like CSS compliance tokenizer and parser, selectors, at-rules and value parsercsstree
parser, but it is pretty slow in solving problemslightningcss
andswc
, but unfortunately they are not quite extensible to support all syntaxes, but probably this is solvable, so it's just a discussion for nowcsstools
with own value and at-rules parserpostcss-value-parser
,postcss-values-parser
andpostcss-selector-parser
, all of them have rather serious limitations and are not so actively maintained, although they are soling almost all current problems, but when a new syntax appears it is usually a problem, another big problem os postcss design, we need to reparse selectors, values and at-rules in each rule, it is very bad for perfomance (very)ListOfComponentValues
Feel free to feedback
I decided to start the problem here, as I think this is the most appropriate place, in the future we may move it or break it into more detailed parts.