semgrep / pfff

pfff is mainly an OCaml API to write static analysis, dynamic analysis, code visualizations, code navigations, or style-preserving source-to-source transformations such as refactorings on source code.
https://semgrep.dev
Other
186 stars 29 forks source link

Slow builds introduced by menhir 20211230 or 20220210 #512

Open mjambon opened 2 years ago

mjambon commented 2 years ago

We were running into out-of-memory errors when we tried building pfff with menhir 20220210. This is a report of the fast build times like we're used to and the new, long build times.

Note that the atd package on which we depend for semgrep doesn't build with menhir 20211230 (some type error), so I didn't check the performance for that version but I suspect this is where it started based on the menhir changelog:

The code back-end has been rewritten from the ground up by Émile Trotignon and François Pottier, and now produces efficient and well-typed OCaml code. The infamous Obj.magic is not used any more.

The times shown are the build times for the whole pfff project after deleting the mentioned build folders (under _build/). The command I used is

for x in lang_*/parsing; do echo $x; rm -rf _build/default/$x ; /bin/time -f "$x: %U s" -o "$(dirname $x).time" make; done

Results

The longest build time is now for lang_js/parsing: it went from 23.67 s to 154.29 s. I didn't run into OOM errors during this benchmarking but I did previously when rebuilding the whole project both in CI and locally (starting my build with about 11 GB available on my machine).

Before (menhir 20211128):

lang_cpp/parsing: 25.15 s
lang_csharp/parsing: 2.67 s
lang_css/parsing: 1.74 s
lang_c/parsing: 3.43 s
lang_erlang/parsing: 2.47 s
lang_FUZZY/parsing: 1.99 s
lang_GENERIC/parsing: 1.64 s
lang_go/parsing: 9.37 s
lang_haskell/parsing: 2.31 s
lang_html/parsing: 3.51 s
lang_java/parsing: 13.27 s
lang_json/parsing: 2.18 s
lang_js/parsing: 23.67 s
lang_lisp/parsing: 2.42 s
lang_ml/parsing: 14.92 s
lang_nw/parsing: 2.28 s
lang_php/parsing: 26.39 s
lang_python/parsing: 10.37 s
lang_regexp/parsing: 3.12 s
lang_ruby/parsing: 14.77 s
lang_rust/parsing: 2.54 s
lang_scala/parsing: 5.94 s
lang_skip/parsing: 2.85 s
lang_sql/parsing: 1.73 s
lang_web/parsing: 1.70 s

After (menhir 20220210):

lang_cpp/parsing: 92.68 s
lang_csharp/parsing: 2.74 s
lang_css/parsing: 1.81 s
lang_c/parsing: 36.69 s
lang_erlang/parsing: 2.54 s
lang_FUZZY/parsing: 2.20 s
lang_GENERIC/parsing: 1.84 s
lang_go/parsing: 20.57 s
lang_haskell/parsing: 2.70 s
lang_html/parsing: 3.52 s
lang_java/parsing: 53.64 s
lang_json/parsing: 2.41 s
lang_js/parsing: 154.29 s
lang_lisp/parsing: 2.55 s
lang_ml/parsing: 32.30 s
lang_nw/parsing: 2.46 s
lang_php/parsing: 67.24 s
lang_python/parsing: 20.36 s
lang_regexp/parsing: 3.46 s
lang_ruby/parsing: 15.20 s
lang_rust/parsing: 2.74 s
lang_scala/parsing: 6.05 s
lang_skip/parsing: 3.16 s
lang_sql/parsing: 1.97 s
lang_web/parsing: 1.87 s
mseri commented 2 years ago

Ping @fpottier

fpottier commented 2 years ago

Could you try menhir -O 0 or menhir -O 1 and let me know if the build times are better?

fpottier commented 2 years ago

(The default level is -O 2, which is sometimes costly.)

mjambon commented 2 years ago

@fpottier I get these build times for the lang_cpp folder, using menhir 20220210:

-O 0:

lang_cpp/parsing: 33.72 s

-O 1:

lang_cpp/parsing: 33.30 s

-O 2 (passed explicitly to be sure):

lang_cpp/parsing: 102.66 s

This is great. I'm not sure if we have a benchmark suite for parser performance. (cc @aryx) Thanks @fpottier!

aryx commented 2 years ago

We dont have

fpottier commented 2 years ago

Thanks. I will probably have to make -O 1 the default.