Hello! We've noticed that the Preprocessing library for ... part of setting up cabal repl in our very large package is taking a long time - somewhere between 10s and 60s. We have 10k+ modules and I don't think we have any files to preprocess by this mechanism (though we do have a few that use the -pgmF formatters - unsure if this is contributing).
I decided to dig into the source code and have identified some potential issues -
The code does for_ mods $ pre .... This is probably the easiest place to get a win by introducing some concurrency.
The main part of the code in preprocessFile does a file lookup for each of the suffixes in each of the directories in the search path. We have two search paths, and there are seven extensions: that's 14 calls to doesFileExist that will (in the common case) fail. I think this could be refactored to avoid the wasteful lookups in the more common case of "there's a .hs file".
Likewise, prepending buildAsSrcLoc : searchLoc is going to trigger an extra file lookup in the common case of "the module is in a hs-source-dirs." Doing searchLoc ++ [buildAsSrcLoc] should save a lookup.
In findFileCwdWithExtension', we call ordNub on the search path and the extensions every time. While the two lists are very small and ordNub is efficient, we're hitting it ~20k times, meaning we're allocating 20k sets of 2 and 20k sets of 7. Using a newtype Nubbed a = Nubbed [a] with an mkNubbed :: (Ord a) => [a] -> Nubbed a would save this work from being repeated in a type-safe way, since the paths and extensions are pretty much shared in each invocation.
I'm happy to prepare a PR to do some of these performance improvements.
Hello! We've noticed that the
Preprocessing library for ...
part of setting upcabal repl
in our very large package is taking a long time - somewhere between 10s and 60s. We have 10k+ modules and I don't think we have any files to preprocess by this mechanism (though we do have a few that use the-pgmF
formatters - unsure if this is contributing).I decided to dig into the source code and have identified some potential issues -
The code does
for_ mods $ pre ...
. This is probably the easiest place to get a win by introducing some concurrency.The main part of the code in
preprocessFile
does a file lookup for each of the suffixes in each of the directories in the search path. We have two search paths, and there are seven extensions: that's 14 calls todoesFileExist
that will (in the common case) fail. I think this could be refactored to avoid the wasteful lookups in the more common case of "there's a.hs
file".Likewise, prepending
buildAsSrcLoc : searchLoc
is going to trigger an extra file lookup in the common case of "the module is in ahs-source-dirs
." DoingsearchLoc ++ [buildAsSrcLoc]
should save a lookup.In
findFileCwdWithExtension'
, we callordNub
on the search path and the extensions every time. While the two lists are very small andordNub
is efficient, we're hitting it ~20k times, meaning we're allocating 20k sets of 2 and 20k sets of 7. Using anewtype Nubbed a = Nubbed [a]
with anmkNubbed :: (Ord a) => [a] -> Nubbed a
would save this work from being repeated in a type-safe way, since the paths and extensions are pretty much shared in each invocation.I'm happy to prepare a PR to do some of these performance improvements.