Exploiting concurrency and/or parallelism when doing name resolution

bollmann commented 9 years ago

Suppose, I have a very big haskell source tree that is not organized in packages and I want to do name resolution for the complete source tree "on the fly" (i.e., without relying on the .names interface files, etc.). The obvious solution would then be something like this:

buildScopes :: [Module SrcSpanInfo] -> ModuleT [Symbol] IO [Module (Scoped SrcSpanInfo)]
buildScopes asts = do
    void $ computeInterfaces Haskell2010 [] asts
    mapM (annotateModule Haskell2010 []) asts

However, suppose this solution is too slow due to the haskell source tree (and its corresponding abstract syntax tree list) being very big. Is there any way in haskell-names (or maybe even haskell-suite) to make name resolution more efficient, e.g., by leveraging multiple CPU cores? Would (e.g.) haskell-names benefit from using the async and/or parallel packages to make computeInterfaces and annotateModule faster?

Or is this approach just completely wrong and I should perform name resolution only once and then store the resolved names in .names files?

I'm not sure if this is the right place to ask, but I felt that asking the most knowledgeable people would make sense.

phischu commented 9 years ago

Hi, thank you for your question. Could you perhaps provide the example data that is slow to resolve? I plan to change the interface of haskell-names to get rid of haskell-packages and provide a function

resolve :: Map ModuleName [Symbol] -> [Module l] -> ([Module (Scoped l)], Map ModuleName [Symbol])

Given an environment (a map from module name to the list of symbols it exports) and a list of modules it returns two things: a list of annotated modules and a map that for each of the given modules contains the list of symbols it exports.

Would this help you to keep the interfaces for exported modules instead of recomputing them?

Finally to answer your original question, it is probably possible to gain speedups by parallelizing but this is not yet on my agenda.

bollmann commented 9 years ago

Hi Philipp,

Thanks for replying back. Well, I'm experiencing my performance issues on trying to parse and then resolve names in a source tree of around 6000 haskell files. To do so, I'm currently using the following:

annotateFiles :: [FilePath] -> IO [Module (Scoped SrcSpanInfo)]
annotateFiles srcFiles = do
  asts <- {-# SCC "parse_asts" #-}
    mapConcurrently (fromParseResult <$> parseFile >=> evaluate) srcFiles
  {-# SCC "eval_moduleT" #-} evalNamesModuleT (buildScopes asts) []
  where
  buildScopes asts = do
    {-# SCC "moduleT_compute_interfaces" #-}
      void $ computeInterfaces Haskell2010 [] asts
    {-# SCC "moduleT_annotate_module" #-}
      mapM (annotateModule Haskell2010 [] ) asts

Unfortunately, running this function on my 6000 files haskell source tree, takes around 60 seconds (45 seconds of which are spent in buildScopes). And this is why I was wondering if there is a way to optimize this. I think it should be possible to at least run the annotateModule in parallel over all asts. Ideally, I would want to run a parList rdeepseq strategy from Control.Parallel.Strategies, but unfortunately haskell-src-exts ASTs cannot be forced to normal form.

To this end, changing the interface as proposed by you would certainly make things easier to use! Because then I could iteratively refine the the computed interfaces of exported symbols. (e.g., I would run resolve once in the beginning and then run it again to refine just the interfaces of changed haskell files, but leave most of them as is). So yea, I would be using it! :-)

phischu commented 8 years ago

Hi, I have finally release a version with the proposed interface to hackage.

haskell-suite / haskell-names

Exploiting concurrency and/or parallelism when doing name resolution #66