jsoverson / preprocess

Preprocess HTML, JavaScript, and other files with directives based off custom or ENV configuration
Other
366 stars 80 forks source link

Measure performance #64

Closed Frizi closed 9 years ago

Frizi commented 9 years ago

After using this tool on project with reasonable size, my build times skyrocketed from ~20s to ~1m 20s, which means it adds a lot more processing than all the other steps combined. And there are a lot of them (pretty much every single file gets analysed at least once, most of them are compiled). And there is almost no processing currently done, so most of its overhead is dedicated to pattern matching. We already have unit tests here, but every suite has only a minimial snippet of code.

New test suite should be done in order to test performance on real sized files and to be able to compare between versions. After that, performance optimizations can be done with more certainty.

BendingBender commented 9 years ago

I guess that this will be really hard to change the performance characteristics without fundamentally rewriting the regex part. As @jsoverson mentioned it a few times, we will have to implement a parser/tokenizer so that the regex part will impose less penalty on processing.

Frizi commented 9 years ago

I'd say not implement a parser, bur rather define a language and use something like PEG.js. But YMMV.

BendingBender commented 9 years ago

Now PEG.js is interesting. Definitely worth a look.

orionstein commented 9 years ago

I would be willing to help with a migration to a proper grammar parsing engine. This seems to me to be the direction that the tool should be heading in, as it continues to gain more rules and features. I would suggest that we would gain 2 major benefits from moving to a tool such as peg.js -

  1. A language engine should be able to make the run in one scan, rather than N scans for each rule with the current regex methodology. It should improve performance on the tool. Even with a better parser/tokenizer, unless we implement our own fsm it would most likely still end up being iterative and not gain as much performance.
  2. Currently there are many edge cases that are being implemented as new rules - @include_static, @include_once. If properly implemented, the language engine should hopefully not require these band-aid style fixes, as the parsing style should be more robust and be able to handle these scenarios.

Hopefully this makes sense to everyone, and I would definitely be interested in this being the future direction of preprocess.

BendingBender commented 9 years ago

@orionstein, since you have commit access, you can start implementing this, if you want to. I would also find it very interesting. I've never implemented a parser/tokenizer before nor have I used PEG.js to define a language grammar. I'd like to participate in this. I think that the best way to start doing it in collaboration is to set up a milestone and create tasks that each of us can assign to oneselves to coordinate the implementation in a branch.

The problem is that to verify that we actually are improving things first we need to set up some performance tests in order to measure our progress, as @Frizi suggested. We have therefore to either make it in nodeunit or port the suite to another test runner (like mocha).

orionstein commented 9 years ago

@BendingBender, I'll start putting it together in my fork - I'm not sure, however, if I would want to commit here a version that had some elements implemented in PEG and some implemented in regex. I think a milestone or feature/directive approach would work well, but it would be a matter of figuring out how to collaborate while remaining the integrity of the current regex system until it is deemed stable enough to move over. A larger scale performance test would be good to write in any case, as it will let us compare the grammar vs regex systems, but also in the case we decide to stick with a regex based system it will help us identify and optimize problematic code moving forward.

BendingBender commented 9 years ago

I was suggesting to work on a branch so that each of us can contribute and can see the current progress. This would prevent situations where the same feature is implemented twice by two different devs and it wouldn't interfere with master. But do as you like, it's your time investment.

Frizi commented 9 years ago

Writing real AST generator with PEG could also help with #59 a lot. In case you are implementing this already, preprocess should be able to output generated AST next to actual output. This may be very helpful for external tools (I'm looking at you wallabyjs).

BendingBender commented 9 years ago

@Frizi did you observe the build duration regression as a consequence of recent changes on this lib or was it as you've introduced preprocess in your project?

Frizi commented 9 years ago

Hard to say, as I didn't had much files preprocessed before. Also, after trying to process everything, I included big libraries to the mix. After blacklisting, it get down to few seconds. It means measurements are needed.

BendingBender commented 9 years ago

Added performance measuring for the test suite run in #71. Not sure whether this will suffice but at least we can compare performance during development of one version. Maybe I should add another perf measurement test but not now.