Closed bollmann closed 8 years ago
Hi, thank you for your question. Could you perhaps provide the example data that is slow to resolve? I plan to change the interface of haskell-names
to get rid of haskell-packages
and provide a function
resolve :: Map ModuleName [Symbol] -> [Module l] -> ([Module (Scoped l)], Map ModuleName [Symbol])
Given an environment (a map from module name to the list of symbols it exports) and a list of modules it returns two things: a list of annotated modules and a map that for each of the given modules contains the list of symbols it exports.
Would this help you to keep the interfaces for exported modules instead of recomputing them?
Finally to answer your original question, it is probably possible to gain speedups by parallelizing but this is not yet on my agenda.
Hi Philipp,
Thanks for replying back. Well, I'm experiencing my performance issues on trying to parse and then resolve names in a source tree of around 6000 haskell files. To do so, I'm currently using the following:
annotateFiles :: [FilePath] -> IO [Module (Scoped SrcSpanInfo)]
annotateFiles srcFiles = do
asts <- {-# SCC "parse_asts" #-}
mapConcurrently (fromParseResult <$> parseFile >=> evaluate) srcFiles
{-# SCC "eval_moduleT" #-} evalNamesModuleT (buildScopes asts) []
where
buildScopes asts = do
{-# SCC "moduleT_compute_interfaces" #-}
void $ computeInterfaces Haskell2010 [] asts
{-# SCC "moduleT_annotate_module" #-}
mapM (annotateModule Haskell2010 [] ) asts
Unfortunately, running this function on my 6000 files haskell source tree, takes around 60 seconds (45 seconds of which are spent in buildScopes
). And this is why I was wondering if there is a way to optimize this. I think it should be possible to at least run the annotateModule
in parallel over all asts. Ideally, I would want to run a parList rdeepseq
strategy from Control.Parallel.Strategies
, but unfortunately haskell-src-exts
ASTs cannot be forced to normal form.
To this end, changing the interface as proposed by you would certainly make things easier to use! Because then I could iteratively refine the the computed interfaces of exported symbols. (e.g., I would run resolve
once in the beginning and then run it again to refine just the interfaces of changed haskell files, but leave most of them as is). So yea, I would be using it! :-)
Suppose, I have a very big haskell source tree that is not organized in packages and I want to do name resolution for the complete source tree "on the fly" (i.e., without relying on the
.names
interface files, etc.). The obvious solution would then be something like this:However, suppose this solution is too slow due to the haskell source tree (and its corresponding abstract syntax tree list) being very big. Is there any way in haskell-names (or maybe even haskell-suite) to make name resolution more efficient, e.g., by leveraging multiple CPU cores? Would (e.g.) haskell-names benefit from using the async and/or parallel packages to make
computeInterfaces
andannotateModule
faster?Or is this approach just completely wrong and I should perform name resolution only once and then store the resolved names in
.names
files?I'm not sure if this is the right place to ask, but I felt that asking the most knowledgeable people would make sense.