use global regexes (cached) for small speed improvement

lukeapage commented 9 years ago

I wonder if this will help with speed given how much we use regex's

seven-phases-max commented 9 years ago

(I added a few browsers there) Curiously it looks like it's dramatically faster only in Chrome :). I guess it's still could be a neat piece of code to use for those regexps that execute the most (e.g. vars, properties till : etc). But it does not look like a solution for those #2339 things.

rjgotten commented 9 years ago

Curiosly it looks like it's dramatically faster only in Chrome :)

V8 likely implements regex literals strictly per ECMAScript 5, which states that regex literals are compiled each time the literal is evaluated, as if it had been declared using the new RegExp(pattern, flags) syntax.

The other engines are probably still implementing the ECMAScript 3 behavior, which states that regex literals are compiled once, before the containing program or function is compiled and executed, and any use of the corresponding literal is treated as a reference to the same compiled object.

Or they're using a hybrid form, where the regex state machine is compiled and created only once, but fresh instances of new RegExp are created around it each time the literal is encountered.

jonschlinkert commented 9 years ago

V8 likely implements regex literals strictly per ECMAScript 5, which states that regex literals are compiled each time the literal is evaluated

You should see what caching can do when the regex is dynamically created from a string based on options or user input etc. For example:

var cache = {};
function createRegex(str) {
  return cache[str] || (cache[str] = new RegExp('foo' + str + 'bar'));
}

I created this not long ago. Really, the code above is all that's needed if the only input is a string, still it's pretty interesting how much of a difference caching can make

matthew-dean commented 9 years ago

If you look at parsers like acorn (http://marijnhaverbeke.nl/acorn/), and a few others that have reputation for speed, you'll notice that there are barely any regexes at all, and instead use sequences of OR || statements, and often check individual characters against numerical character codes. The code ends up being more verbose, but from what I've read, JS engines can optimize the crap out of very linear logic statements and static functions. Regexes are more of a mixed bag, and things like backtracking in regex can be expensive.

In Less, we often have a series of OR checks for regexes, and those regexes themselves have backtracking OR switches. Pragmatically, it's easy to maintain, but that pattern is more likely to negatively impact performance.

I've been thinking about how we can very gradually improve speed of individual node matching. I was reading this - https://github.com/felixge/faster-than-c (a very good read about making JS faster than C), and he advocates benchmark-driven development. I was wondering if we could create some kind of test harness, like our own kind of JSperf, that has sequences of .less input, and tests the performance of matching to a single matching function in the Less library (and verifies that the output is as expected). Then a dev could tweak that function, run more tests, save / graph the results.

That way we don't have to KNOW what would be faster so much as try something and see if it's faster. If it's faster on all supported platforms, make a PR for that particular function, badda bing, badda boom.

I'm not sure the best way to go about that, but it would be great to be able to quickly test all these great ideas.

matthew-dean commented 9 years ago

Also, would there be any value at looking at JS-based CSS parsers like https://github.com/css/gonzales to see if there are ideas we could port? Surely there would be relevant pattern matching from something like that. I know each of our entities gets a lot more complex, but just an idea.

rjgotten commented 9 years ago

If you look at parsers like acorn (http://marijnhaverbeke.nl/acorn/), and a few others that have reputation for speed, you'll notice that there are barely any regexes at all, and instead use sequences of OR || statements, and often check individual characters against numerical character codes.

That type of parser formally separates itself into a lexer/tokenizer and a LALR(k) parser. The power and speed of that approach comes from employing generalized regular expression matches on a very small scale (mostly for recognizing identifiers, filtering whitespace, etc.) and limiting yourself to a fixed set of hand-coded lookaheads by hand-writing a parser based on accepting the lexer's tokens via a set of production rules formulated as a BNF grammar.

You could do that with Less as well, but you'd need to build on a compliant CSS parser first, which can tokenize and parse CSS selectors, which (if I understood correctly from the original discussion) is really where regex lookahead starts to hurt a lot.

I remember CSS 2.1 having a BNF grammar for the language in the W3C recommendation. I assume CSS 3 is no different and does the same. You could still short-circuit a lot of the grammar, since you don't need to know every little detail of it.

matthew-dean commented 9 years ago

@rjgotten Is this what you meant? http://dev.w3.org/csswg/css-syntax/

They really break it down. Just came across that recently.

lukeapage commented 9 years ago

I looked briefly at switching to a parser generator - basically its a rewrite. Id love to use a parser generator to make it all generated.

matthew-dean commented 9 years ago

I've read that parser generators can leave you with a different set of problems (advocating hand-writing parsing), but I don't really know one way or the other. I think as long as we can benchmark the results, however we got there is all good.

lukeapage commented 9 years ago

Can you run this through on a few browsers?

http://jsperf.com/regex-string-size

I'd expect if size of the string is less's performance problem (as shown by chunkInput) I'd expect to see the speed decreasing with size of the string. Any idea what wrong with the test if that isn't happen? Got to run for a flight or I'd have done more testing...

matthew-dean commented 9 years ago

On my setup, Safari performed best, with all the non-cached operations happening at roughly the same rate, and the cached operations at a different rate. Chrome 41 shows performance issues as the string size increases. Chrome 42 does not show as much issue. Firefox trails way in last with every operation performing the same rate, cached or not.

I think we can expect that every JS engine will optimize each regex differently, and even the same regex will perform differently with each input. So, nothing will beat testing.

I did another arbitrary test here (http://jsperf.com/regex-string-size/2) , just comparing OR in regex vs. testing two regexes and matching one or the other. I also changed up where the backtracking happens in the regex. And the browsers don't all change performance at the same rate. In fact, Firefox does better at an OR between two regexes than one of the regexes which backtracks in a different place, while Chrome and Safari always do much better at a single regex.

At the very least, the data shows caching is generally better than not caching, except Firefox. But then, it's certainly not worse.

I think we can start with what @jonschlinkert said. For any place in the code where we're testing against multiple regexes, it should still be faster to dynamically create a regex and test against a single one. And even faster if the regex isn't being redefined every time the function is visited. (Implement caching). And then run benchmarks on changed functions.

lukeapage commented 9 years ago

Cached vs non-cached seems to indicate regex setup cost does not increase with regex complexity so it might be a small win. I tried adding a few current character checks to avoid running the regexes used the most. Will post back here next time i work on it...

seven-phases-max commented 9 years ago

I'd expect if size of the string is less's performance problem (as shown by chunkInput)

I think it's not just the size of the input string but the "the size + some lookahead regexs". I.e. only certain re patterns timings may increase exponentially as the input size grow and chunkInput just makes those to bail out earlier. So to get the real picture a test needs to involve one of those "bad" regexs (usually those with ?----+/*) and the input string also needs to be not not-ambiguous for that particular pattern. Simple patterns (like this .mixin one above) find their match in the input immediately hence the input size does not affect too much.

P.S. (a bit offtopic) When making the highlighter I found this tool to be quite useful for debugging of such "bad" patterns (it really helps to detect some problematic parts earlier though you still need to invent/predict those ambiguous inputs that may put the regex down yourself (so it's still kinda tedious thing)).

matthew-dean commented 9 years ago

you still need to invent/predict those ambiguous inputs that may put the regex down yourself (so it's still kinda tedious thing)

I agree, it's a ton of work to try to predict what will work. That's why it would be great if we could staple a benchmarker integrated to the Less lib directly (only loaded for tests / development). Change code, and if the results are accurate, and it's faster, keep code. If we could turn on the benchmarks just for function X and run all the less tests (which would throw a bunch of .less patterns and file sizes, yes?), it seems like it would give a more "real-world" picture.

matthew-dean commented 9 years ago

@seven-phases-max The highlighter and other tools are cool too, definitely should check those out. Let's hit this from all sides. Maybe we could make "performance" the next major feature (other than features in progress)?

lukeapage commented 9 years ago

Ive fixed the benchmark test and made it more accurate - I was using that. But thats in node / v8... Im most interested in fixing the regexs making parsing slow in non-v8 with large files.. the hardest part will be finding which ones that is.

Im not really interested in performance that much - i think code readability is more important, but at the moment we need to fix non v8 browsers.

Once fixed (if possible) i can permanently remove the chunker.

rjgotten commented 9 years ago

Im most interested in fixing the regexs making parsing slow in non-v8 with large files (...) Im not really interested in performance that much - i think code readability is more important, but at the moment we need to fix non v8 browsers.

Well, if you want to do something about both: start by separating your parser into a real lexer/tokenizer and a token-based parser. Right now, you still have a lot of tokenization happening inside the actual parsing routines. And if your tokenization is happening mostly based on regexes, that's really like extending an invitation to all the degenerate backtracking cases to come and join the party.

Performance-wise, the good thing about having a separate lexer is that you generally only need regexes to recognize a small pool of token types from text and as tokens almost exclusively consist of one word, it means you have very little chance for degenerate backtracking in a long chain to recognize a token.

The parser no longer receives a stream of text, but a stream of strongly-typed tokens. Tokens are always the same (even if they can be used in different roles). The 'real' backtracking that disambiguates language constructs will then happen inside the parser only and won't have to involve complex backtracking in regexes anymore.

The maintainance/readability upshot of this is that you get named constants for your tokens that are much easier to work with and your parser will contain only the isolated parser logic, being much easier to follow than also receiving the added cognitive load of dealing with the inlined regexes and text stream processing.

Also, if you formally separate lexer and parser, you can more easily handle another current issue, namely the one regarding comment parsing.

Structured tokens means you can attach debug information like line and column number and a sample of surrounding input directly to each token. That enables you to keep that process intact while choosing to omit comments whenever you 'fetch a next token' from the lexer. I.e. you pretty much get the whole 'selectively allow comments' idea for free.

lukeapage commented 9 years ago

yes, it would be nice, but its a big change that I don't really have time to do - I inherited the existing parser and never felt like I had enough time to re-write it. It is a different approach from the current "chunker", which just aims to split the input up into things about 200bytes long, guessing boundaries, but it gets unstuck by things like

url(http://) }
// vs a comment

I started with something simple - replacing regex's that don't have any logic in them. they aren't the ones causing the problem, but still.. https://github.com/less/less.js/commit/f8de5bcb16aea8682409e3ab261cc14ef77ecd40

lukeapage commented 9 years ago

omit comments whenever you 'fetch a next token' from the lexer. I.e. you pretty much get the whole 'selectively allow comments' idea for free.

We can do that already - that isn't the problem with comments, the comments is how they are represented on the ast is all wrong.

lukeapage commented 9 years ago

A way towards a lexer might be to start using save more and slowly convert regex's to only use str and char. At that point a lexer can easily be introduced and the str and char's swapped out for token comparisons.

rjgotten commented 9 years ago

that isn't the problem with comments

Yeah; I kind of glossed over the fact that comments are the one thing that's already handled at the lexer level (i.e. parser-input). My bad.

matthew-dean commented 9 years ago

But thats in node / v8... Im most interested in fixing the regexs making parsing slow in non-v8 with large files.. the hardest part will be finding which ones that is.

That probably means the benchmarking piece may need to be browser-based somehow.

But, I would also say that it's reasonable that if we address performance bottlenecks in V8, it's very likely to solve things on other engines. For instance, if there are pieces which cause exponential slowdown, and V8 is starting with a small amount of slowdown, then just addressing that small piece should speed up other engines by orders of magnitude. Not in every case, but I don't really see this as optimizing for a particular JS engine (although that's nice when that happens in the case of V8), but just figuring out more efficient algorithms.

So, I'm interested in those other engines too; I just don't think they need to be specifically targeted in order for performance to increase.

matthew-dean commented 9 years ago

I just came across this relevant quote by programmer Jaime Zawinksi:

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

lol

seven-phases-max commented 9 years ago

Now they have two problems.

Well, the problem is not in regular expressions but in how one uses them. As a C++ programmer who has never used regexs before ~2012 or so I can assure you ;). Things like "OK, I just replace this regex with a bunch of my spagetti ifs" do not work too. One has to write a pretty descent tokenzer/lexer/parser/whatever abstraction layer/kernel to keep things going smooth.

matthew-dean commented 9 years ago

I know, no principle in programming is absolute. Just funny because it contains some truth.

jonschlinkert commented 9 years ago

Now they have two problems.

haha, awesome

matthew-dean commented 9 years ago

Just, FYI, @lukeapage (et al) I decided to follow up with this:

I looked briefly at switching to a parser generator - basically its a rewrite. Id love to use a parser generator to make it all generated.

I've been looking at a few libraries and reading on different parser generator methods and algorithms. However, I don't want to clutter this thread here until I find a good candidate (which I may have). Just wanted to mention it on the off chance anyone else was doing that work. Once I find something that MAY work, then I may need to try to generate the parser in order to compare performance.

rjgotten commented 9 years ago

I think the parser may infact be the lesser of all performance issues.

I added some simple console.time() / console.timeEnd() logging around the various execution phases of the Less compiler.

Then I took one of the bigger projects of my employer I had available: in the order of 70 files, totalling around 150 KB in size. I also knew it outputted a 130 KB result file, with very little in the order of large loops generating lots of iterated CSS that could inflate run time.

I ran this project through the compiler in NodeJS and the results are rather telling:

parse: 390ms
pre-eval visit: 0ms
eval tree: 539ms
post-eval visit: 47ms
to css: 31ms

My hunch is that this is related to code de-optimizations and re-optimizations happening due to type shapes being altered while contexts, frames, etc. are being mutated.

lukeapage commented 9 years ago

I think the parser may infact be the lesser of all performance issues.

yes you are right, this conclusion has been reached before.

I've been concentrating on it because of the firefox/ie/safari bugs caused by parsing being spectacularly slow with long strings.

My hunch is that this is related to code de-optimizations and re-optimizations happening due to type shapes being altered while contexts, frames, etc. are being mutated.

I've performance analysed this bit as well, looking for any easy wins and I found alot of recalculations - e.g. generating a scope for each mixin call seemed to take a large part of the calculation, even when it wasn't used.

My hunch is that probably some clever memoisation and lazy calculating could make it much quicker. - but I didn't see anything I could comfortably do in a couple of hours, which was my timeframe when I looked.

rjgotten commented 9 years ago

I have two weeks free time coming up and I may still get an itch to code. Maybe I'll dive in and have a look as well. Could be interesting. ^_^

matthew-dean commented 9 years ago

Eval tree? What happens in that stage?

My hunch is that this is related to code de-optimizations and re-optimizations happening due to type shapes being altered while contexts, frames, etc. are being mutated.

I'm not sure I grok all of this, but I do know that changing the shape of an object is slow for JIT compilers.

As in, this:

var foo = {
  bar: true
}

is faster than:

var foo = {};
foo.bar = true;

matthew-dean commented 9 years ago

Also, 390ms would still be a win if it was 150ms instead, so I'm going to look at parser generators anyway when I have time. I'll share if/when I find something interesting.

lukeapage commented 9 years ago

Matthew, ive managed to remove alot if regexes from the current parser - were those measurements from master or last release?

matthew-dean commented 9 years ago

@lukeapage Were you asking @rjgotten? I was referring to his measurements.

lukeapage commented 9 years ago

Yes, sorry got a bit muddled replying on my phone while walking outside... The first bit was for you, the second bit @rjgotten.

rjgotten commented 9 years ago

Eval tree? What happens in that stage?

The bulk of the compiler's work. It's the phase where variables are resolved, mixins are called, etc. which eventually should produce a reduced tree of nested plain rulesets with property rules.

(Afaict In the post-visit phase the nested selectors are resolved into their final complete form. And then the complete thing gets handed off to the CSS generation phase, which is simply where genCSS is called recursively on the final tree to build up the final CSS output.)

were those measurements from master or last release?

From master.

matthew-dean commented 9 years ago

Just curious, what's the reason that things are converted to plain but nested rulesets and then again reduced to CSS rulesets? Couldn't the intermediary step be skipped? I mean, I know they're sequential, but just wondering if there would be a way to reduce object creation. After all, there's still the matter of frequent huge GC dumps, although that may have been mitigated with chunkInput turned back on.

seven-phases-max commented 9 years ago

It's not that they are specifically converted to plain rulesets. They just become plain because their internal stuff (e.g. vars and mixins) is "expanded". So, yes, it's just one step ("flatten rulesets") after another ("expand ruleset internals")...

rjgotten commented 9 years ago

@matthew-dean Post-processing visitors also handle stuff that needs the globally final state where everything has been 'expanded' / 'resolved' / etc. , but not yet unnested into plain CSS rulesets. E.g. the prop +: value merging syntax and, if I'm reading it correctly, @<directive> bubbling.

Regarding reduction of object creation; I think @lukeapage hit the nail on the head with the scopes involved in mixin call evaluation.

I have a rather complex system in a current project that maps a set of names to character codes, for use with font-based icons. The supports leaving a character code undefined for one named icon, at which point it should take the code of the previous icon, incremented by one.

Very nice for custom icon font definitions as you can keep a neat and tidy ordering without having to remap codes the whole time. However; it never managed to scale well and gave me ridiculously long compilation times when I got into larger icon sets.

Anyway, it has quite a bit of branching-logic inside a tight loop that has to iterate over the list-of-lists I used to model the dictionary / map, which means mixins ... with guard conditions ... and overloaded signature matching ... on each iteration.

As a matter of experiment, today I removed the feature where undefined codes are automatically filled based on increments and I wrote a few dedicated custom functions to directly perform key lookups on a list-of-list (I love my new @plugin; makes this so--- easy to integrate ^_^) without having to write mixin-based iteration.

Doing away with the mixin-based branching-logic and iteration this way shaved 7 seconds off of the eval tree phase of the compilation of that project's file set and that was with a fairly limited set of just 50 icons.

As an aside: I have since restored the auto-increment feature. Managed to write a function that creates a lookup structure out of a detached ruleset's properties. That gave me the option of writing an auto-incrementing loop based on mixins within the detached ruleset that is eventually converted into a key->value map. It's quite nice. If my employer allows, I may be able to arrange the complete set of map functions being made available publicly.

seven-phases-max commented 9 years ago

I think @lukeapage hit the nail on the head with the scopes involved in mixin call evaluation.

It's the problem of the language. I don't think @lukeapage or anyone else can do too much about this, i.e. of course there can be some performance improvement tricks here and there, but nothing can cancel the whole picture: all this stuff just has to be re-evaluated in each scope because every scope may have different variables and mixins that may change the behaviour of another mixin call/match.

(Meanwhile there's way to "iterate" over a list w/o any loops at all... It's just has to be different list not one of a variable ;) (of course since this method is also mixin-based it probably can't be as fast as dedicated JS-function).

rjgotten commented 9 years ago

of course since this method is also mixin-based it probably can't be as fast as dedicated JS-function

I guess the takeaway here is that if you need to do heavy-lifting stuff and need to do it fast, you really need the JS escape hatch. ;-)

seven-phases-max commented 9 years ago

WebGL to go! (who needs these stone-age text based icon descriptions when you can iterate and render right in GPU :)

SomMeri commented 9 years ago

I found alot of recalculations - e.g. generating a scope for each mixin call seemed to take a large part of the calculation, even when it wasn't used.

@lukeapage Would lazy scope generation be an option? The disadvantage is that the code may end up more complicated. Where is in the code is the scope generated?

SomMeri commented 9 years ago

About parser tokenization/generation: less4j is written that way and I am not sure benefits are worth complete rewrite. Adding features like detached rulests is somewhat faster I think, but you are loosing complete control you have now.

Css/less does not break down into nice strongly typed tokens, so I have to rely on semantic predicates and "tricks" more then I like. Less.js knows the context, the code it uses to parse selector is completely unrelated to code used for property names or anything else. Less4j on the other hand has to use the same tokenizer for everything which leads to very small almost weakly typed tokens.

Similar problem happen with whitespaces. They are ignorable almost everywhere except some places. It was solvable of course, but somewhat easier in hand made parser. A lot of what is in css is like that - many general rules have at least some exceptions.

If I would be doing third compiler, I would likely go with generation again (depending on grammar), but it is pretty tight. It can be that current less.js architecture has downsides I do not know about, I did not looked that much into parser part. What I seen looked straightforward.

rjgotten commented 9 years ago

@SomMeri

Less.js knows the context, the code it uses to parse selector is completely unrelated to code used for property names or anything else. Less4j on the other hand has to use the same tokenizer for everything which leads to very small almost weakly typed tokens.

But that's what a tokenizer (or rather: lexer) is and does: it generates tokens. It's then up to a parser to decide how particular sequences of tokens are allowed to fit together and what meaning is attributed to those sequences.

You may think those tokens 'weakly typed', but infact they are very specifically typed: identifier, opening-brace, closing-brace, etc. The wider and weaker you make that definition, the harder it becomes to manage token extraction correctly without complicating the lexer into a full-blown parser with need for complex backtracking etc. implemented directly on the character stream with regexes.

Tokenization is meant as a pre-processing step that 'pre-shrinks' the input set, if you will and prevents you from having to run painfully large regular expressions over huge input strings.

If fully automated generation isn't your thing; you could always write a lexer/parser separation by hand. Full control and all the maintainance (and theoretical performance) benefits.

SomMeri commented 9 years ago

Some grammars are better fit for that than others. Parsing java (minus error handling) or python this way is likely to be easy and straightforward - they were designed in clean way that makes it easy. Css and thus less have features that makes it less perfect fit. The thing about css/less is that it sometimes resembles multiple grammars merged into one file. I seriously considered having special grammar for selectors where I would parse them only on superficial blob and then run through secondary parser.

Take identifiers - one way or the other they are pushed into parser instead of lexer. Less does not have keywords, so all could be ending tag in selector, could be all as config for extend or could be just an identifier. Same with variable names and at-rules, same with when etc.

Selector inside extend needs to be parsed almost as normal one, except that all in the end. Nothing unsolvable, but if I would be able to re-tokenize part of grammar that would be awesome. ANTLR actually supports something like that - you can switch tokenizers on the fly, but not in a way needed here.

In java, you can throw away most whitespaces. In python, whitespaces generarely matters. In less, they dont except few exceptional places - so you end up sacrificing performance or doing tricks on those.

As I told, it is nothing unsolvable. Les4j is very close to less.js and has everything I am aware of minus options and new features. But, it does slow you down every time something like that pops up and it forces me to do maintainability or performance trade offs I would prefer to avoid.

Side-note: it happened to me twice that I googled a problem and answer on antlr mailing list was: "best to redesign your grammar". They were right, but we can not really do that.

SomMeri commented 9 years ago

The other thing to be considered is error handling and error reporting. I have read about teams switching from generator to hand-made parser for that reason - it turned out easier that way.

If you really want to get it right with parser, your grammar can become big/convoluted due to error handling rules. I think that ANTLR 4 was trying to make it easier, but I did not looked at it yet. (ANTLR 4 should be also faster and more programmer friendly.)

It is something to be considered before switch, because you do not want to end up switching to other generator or hand-made again because of that.

matthew-dean commented 9 years ago

One problem with ANTLR is that it seems to make extremely verbose parsers. The grammar definitions seem fairly complicated to design / maintain. An auto-generated parser may be the way, but I'm not sure ANTLR is that solution IMO.

matthew-dean commented 9 years ago

What about something like this: https://github.com/Hardmath123/nearley

rjgotten commented 9 years ago

Take identifiers - one way or the other they are pushed into parser instead of lexer. Less does not have keywords, so all could be ending tag in selector, could be all as config for extend or could be just an identifier. Same with variable names and at-rules, same with when etc.

You're not thinking small enough and as a result you're still making the tokenizer too powerful.

For at-rules and variables the @ itself would be an AT token output by the lexer. The parser would read that token and (depending on its current parsing context) would open an at-directive, variable declaration, variable evaluation, etc. parsing context.

When the parser asks the lexer for the next token it does so with explicit instruction to not allow whitespace inbetween. The parser then expects the lexer to respond with an IDENT token. The parser then decides what to make of that token based on its string value and will compose the full at-rule name, variable name, etc. accordingly.

If the lexer doesn't reply with an IDENT token or the parser can't make anything of the value of the IDENT token, then the parser pops its current parsing context and returns to the previous one to retry the next valid production rule.

If you work with generalized tokens this way, then when the parsing context is popped you can also shift all previously lexed tokens back onto the front of the token 'stream' as the tokens themselves are free of context. This means you don't waste cycles on relexing. You only waste a bit of memory in keeping processed tokens around in a memory buffer until you can be 100% sure they can be discarded.

less / less.js

use global regexes (cached) for small speed improvement #2501