kcsongor / generic-lens

Generically derive traversals, lenses, and prisms.
437 stars 53 forks source link

Compile time performance #110

Open kcsongor opened 4 years ago

kcsongor commented 4 years ago

I took @arybczak's benchmarks and started experimenting with generic-optics a little. It seems that optimisations (unsurprisingly) are responsible for most of the compile-time overhead. I don't think much can be done to speed that up without changing GHC itself. However, it should be possible to speed up the -O0 compile times at least, which I think a lot of people use during development.

With -O0 and some experimental changes in the internals yield the following results:

multiple generic memory time
0 0 37M 1.1s
0 1 58M 1.4s
1 0 47M 1.5s
1 1 65M 1.6s
2 0 50M 1.7s
2 1 67M 1.9s

The "experimental changes" are essentially flattening and specialising the class hierarchies, which result in improved compile times at the cost of some duplication in the library internals. Before these changes, the last row (multiple=2 and generic=1) allocated 246M and took 5.4s to compile on -O0, so the improvement is quite significant.

I will investigate further to see if anything else could be done.

kcsongor commented 4 years ago

As an additional point, when loading the file via ghcid, the generic version reloads noticeably faster than the TH version (though both slightly below 1s) on the multiple=2 setting.

kcsongor commented 4 years ago

Another idea I had was to try and eliminate some redundant simplifier runs by carefully phase annotating the INLINE pragmas, but my experiments on this haven’t been fruitful so far.

arybczak commented 4 years ago

Can you push these changes (to a branch if you don't want to merge them yet)? I'm curious.

kcsongor commented 4 years ago

Yes of course. I plan on getting back to this in the coming days, and hope to merge it soon.

arybczak commented 4 years ago

BTW, I checked how #112 affects the benchmark and I noticed weird things.

First of all, I expected the compilation to be slower with #112, but it was actually faster.

Then I checked core and it turned out that even with #112 applied the core with multiple constructors isn't equivalent to TH version (residue of generics remains and field lookups are linear).

I then upped unfolding threshold to 250 to get core equivalent to the TH version (MULTIPLE=2 needs 250, for MULTIPLE=1 150 is sufficient) and compilation is even faster (I also used field' instead of field).

Here are times for -funfolding-use-threshold=250 and usage of field':

multiple memory time
0 147M 2.5s
1 251M 4.5s
2 227M 6.5s

I have no idea what is going on there (especially with memory usage), but it seems worth investigating.