thibaudcolas / curlylint

Experimental HTML templates linting for Jinja, Nunjucks, Django templates, Twig, Liquid
https://www.curlylint.org/
MIT License
236 stars 25 forks source link

Optimize parsing #124

Open adamchainz opened 2 years ago

adamchainz commented 2 years ago

A few changes:

  1. Switch from multiprocessing to concurrent.futures with ProcessPoolExecutor. This is just to make the code easier to work with.
  2. Perform check_file() as results return to the main process, in order to reduce peak memory usage. Previously all parsed files were kept in memory before being checked, leading to massive memory usage.
  3. Construct parsers once per process, rather than once per file. Previously parser construction took ~5% of the runtime, this reduces it to a constant amount.

Benchmarked on a project with 238 templates.

Before:

$ time curlylint templates/**/*.html
All done! ✨ 🍰 ✨

curlylint templates/**/*.html  352.25s user 3.37s system 999% cpu 35.575 total

After:

$ time curlylint templates/**/*.html
All done! ✨ 🍰 ✨

curlylint templates/**/*.html  324.22s user 2.79s system 995% cpu 32.858 total

~8% of the time saved.

The parser remains quite slow, I think it does an unfortunate amount of backtracking.

adamchainz commented 2 years ago

Okay turns out a lot of tests call parse_source - perhaps the parser construction can be made a bit more lazy with @lru_cache or similar.