haskell / haskell-language-server

Official haskell ide support via language server (LSP). Successor of ghcide & haskell-ide-engine.
Apache License 2.0
2.71k stars 368 forks source link

Storing dependencies contributes significantly to memory usage #2963

Open mpickering opened 2 years ago

mpickering commented 2 years ago

I ran a profile after loading GHC into HLS and see that many of the large sources of allocation are due to to big lists of keys.

Total: 1.8 million allocated key values for 90MB (10%) of live data

hls-graph-1.7.0.0-inplace:Development.IDE.Graph.Internal.Types:Key:89794800:1870725:48:48.0

1.4 million, 34MB of lists containing keys

ghc-prim:GHC.Types::[hls-graph-1.7.0.0-inplace:Development.IDE.Graph.Internal.Types:Key,ghc-prim:GHC.Types::]:34671936:1444664:24:24.0

1.2 million, 29MB of the GetModSummaryWithoutTimestamps Key

ghc-prim:GHC.Tuple:(,)[ghcide-1.7.0.1-inplace:Development.IDE.Core.RuleTypes:GetModSummaryWithoutTimestamps,lsp-types-1.4.0.1-bc66547fb74f5fe287e9850a7cf5b08fad3aa0baf55d7ffc8b91807e767ee251:Language.LSP.Types.Uri:NormalizedFilePath]:29105232:1212718:24:24.0
pepeiborra commented 2 years ago

These do not look like reverse dependencies to me. Reverse deps are stored in a HashSet:

https://github.com/haskell/haskell-language-server/blob/30d48ed705e929e4ff7da16b00ca4946ea850407/hls-graph/src/Development/IDE/Graph/Internal/Types.hs#L99-L102

pepeiborra commented 2 years ago

More generally, the space usage of build keys is something that I noticed a while ago. I tried to maximising sharing of NormalizedFilePath values by hashconsing them here:

https://github.com/haskell/lsp/pull/340

But then decided to revert that change since it has its own set of problems:

https://github.com/haskell/lsp/pull/344

And instead apply a more localised fix for the worst offender:

https://github.com/haskell/haskell-language-server/pull/1996

pepeiborra commented 2 years ago

Question: are the space usage stats produced by ghc-debug aware of sharing?

wz1000 commented 2 years ago

Sorry, these aren't reverse dependencies but the direct dependencies stored in ResultDeps.

Question: are the space usage stats produced by ghc-debug aware of sharing?

Yes, I believe so.

I think in this case a large part of the problem is that Key values aren't shared, because toKey allocates a new tuple on every call, even though the total number of distinct keys is about ~15,000 in this example.

Also, the definition of Key as data Key = forall a . (Typeable a, Eq a, Hashable a, Show a) => Key a means that each Key contructor needs to store 5 pointers (4 to the class dictionaries). This could possibly be reduced to 1 pointer for the class dictionaries if we had

class (Typeable a, ...) => C a
instance (Typeable a, ...) => C a
data Key a = forall a. C a => Key a
michaelpj commented 10 months ago

Is this still a problem?