metaborg / nabl

Spoofax' Name Binding Language
Apache License 2.0
7 stars 12 forks source link

Improve performance of project library export for large projects #51

Closed jochembroekhoff closed 3 years ago

jochembroekhoff commented 3 years ago

Performance of default collect-all and make-set is too bad to be used practically for larger projects. Recently I tried to do an StxLib export of a standard library project, which contained ~180 files, in total consisting of about 26k SLOC. My Statix specification is not too small either, but I would not consider either 'large'.

After an hour I killed the project library exporting and started debugging. This led me to discovering that collect-all and make-set do not appear to scale well. Both use union internally which appears to be the main underlying cause.

That is why I pass a different union strategy to collect-all, overriding the default. This is a dummy union implementation which merely concats the two lists. Concatenating lists is still not too efficient, major optimizations could be done if StrategoArrayLists were to be concatenated as a view (in this particular case, might be suboptimal otherwise).

Anyways, this means that the result of collect-all is not distinct anymore, so I still pass it through some alternative set creation strategies. These replace make-set (which actually wouldn't have been necessary before, because collect-all produces a distinct result). Uses strategies from the relatively new immutable collections API from the standard library.

Exporting a StxLib from my project now takes about 35 seconds (instead of 1h+). As far as I am aware, the effective output is the same.

jochembroekhoff commented 3 years ago

In fact, having done some primitive manual benchmarks (on a single big file) it appeared that the export time scaled exponentially in terms of SLOC.

AZWN commented 3 years ago

Thanx! Would be a good idea to look in-depth into the performance issues with union/make-set, but for now I'm happy with this improvement.