Incremental parser leaks space?

ethercrow commented 11 years ago

Original author: reiner.p...@gmail.com (May 24, 2011 08:45:03)

What steps will reproduce the problem? Use yi for a while, editing Haskell files with the Haskell mode (cleverMode). Over time, the memory consumption increases. I have found it quite typical for yi to consume around 300MB after I have used it for around 30mins.

Running 'reload' drops the memory use to much less -- often around 30MB.

The attached heap profile suggests that the incremental parser is mostly responsible.

Original issue: http://code.google.com/p/yi-editor/issues/detail?id=356

ghost commented 9 years ago

The issue seems to be that the incremental parser uses too much memory (confirmed in Haskell precise mode as well). The more the file gets traversed, the higher the memory consumption. The memory consumption does seem to stay constant once the entire file has been traversed, so the issue is just that the parser uses too much space. This is a big issue, since, the amount of space required to traverse the entire file is about 2GB for a Haskell file which is 50k sloc and 1MB in size.

ghost commented 7 years ago

Here's a heap profile using Haskell mode to show the space leak in the Incremental parser:

heap_profile

Syntax highlighting for Python does much better in this regard:

better_heap_profile

The operations I was performing was traversing the file with no changes to the file itself. Towards the end, you can see that the memory usage stays consistent since the entire file has been traversed.

I can verify that the parser runs in constant memory with the default mode (that is, no syntax highlighting). This is what the heap profile look like in this case:

heap_profile_default

So Haskell mode is forcing the Incremental parser to evaluate unnecessarily.

ghost commented 7 years ago

Profiling data seems to indicate that our culprit is getIndexedStream in https://github.com/yi-editor/yi/blob/c2d8d7eeb2567ee4634a523b68bef08cd0747ce7/yi-core/src/Yi/Buffer/Implementation.hs#L431 which is forcing the text. We would ideally like to be able to traverse the YiString somehow without keeping it in memory.

noughtmare commented 7 years ago

From the comment it seems that @Fuuzetsu already suspected getIndexedStream would become a bottleneck.

-- | TODO: This guy is a pretty big bottleneck and only one function -- uses it which in turn is only seldom used and most of the output is -- thrown away anyway. We could probably get away with never -- converting this to String here. The old implementation did so -- because it worked over ByteString but we don't have to.

ghost commented 7 years ago

@noughtmare Yes. There was a discussion about this a long time ago on IRC. Should have had the discussion here instead. The issue is not specific to Haskell mode, rather, it's syntax highlighting in general.

I think getIndexedStream was suspected to be a CPU bottleneck. Here it's causing a space leak. Not exactly the same, but getting rid of it is beneficial regardless.

Modifying the syntax highlighter such that it doesn't need to use getIndexedStream would probably be the way to go: https://github.com/yi-editor/yi/blob/51855c9971c1e611b2c823de2ac4bd068f6dfb78/yi-language/src/Yi/Syntax.hs#L110

Or, at the very least, figure out why it's forcing the incremental parser.

Fuuzetsu commented 7 years ago

Modifying the syntax highlighter such that it doesn't need to use getIndexedStream would probably be the way to go

Yes, the problem is that IIRC it's non-trivial which is why the function stuck around to begin with.

noughtmare commented 7 years ago

Why is the getIndexedStream function bad, is it because it returns a list? Could we use one of the streaming libraries (like pipes or conduits) to circumvent this?

Fuuzetsu commented 7 years ago

Everything is implemented in terms of YiString (which IIRC provided great speedup and much joy all around). It's implemented as a rope with chunks of Text. Here we call toString which unchunks the tree, unpacks concats the Text chunks, unpacks into String, potentially reverses the whole thing (yes, whole buffer), throws away some part of it and then returns the rest of the string chunked into 1 character each. As you can imagine, this isn't fast. Streaming libraries don't really help because implementation will have to be changed anyway.

I suppose the least painful improvement would be to write some functions in yi-rope which instead of doing all of the above provided the caller with some sort of an iterator that walks over the YiString directly. This could be better.

I have a bunch of time off work from tomorrow so maybe I could look into it if you can't/don't want to but I've been known to say that then disappear!

noughtmare commented 7 years ago

Would just using a foldl' be more memory efficient than the current situation? We could then make a field of the Highlighter be a function like:

hlStep :: State a -> Text -> State a

Or do you mean that 'walking' the YiString from 'left' to 'right' is not the way to go?

ghost commented 7 years ago

@Fuuzetsu go for it. I'll try picking up in a few weeks if you do disappear.

noughtmare commented 7 years ago

I could look into it if you can't/don't want to

I intend to look into this further (I think this issue might be related to the problems I'm facing with #946). But any help is very much appreciated!

Fuuzetsu commented 7 years ago

I'm saying we can walk it but the way we're doing it right now is awful. It might have worked many years ago when the stream would have been produced lazily but in this case I believe we're going all out and then walking a forced stream. I think, I don't have data on hand.

Profile the function and see what it does.

noughtmare commented 7 years ago

The fastMode for haskell seems to be the exact same as the clevermode but then without the Incremental Parser overhead.

fastMode:

cleverMode:

noughtmare commented 7 years ago

Ok, I think I found the problem. The Highlighter caches every step of the making of the abstract syntax tree All these states are being cached to speed up the generation of the AST when the buffer is edited (which naturally takes a lot of memory).

~~A solution to this would be to not cache any state at all, at the cost of more cpu time.~~

I think that we could also save a lot of memory by caching the state between a point and the previous point instead of caching all state up to a point for every point (this might not help because of laziness).

This is wrong :(.

PS: I don't think this bug is caused by the transition to the yi-rope type, because it was filed in 2011 (I believe yi-rope was made around 2014).

ghost commented 7 years ago

@noughtmare Can you verify that? I believe I tried turning caching off at some point, and didn't see a difference.

noughtmare commented 7 years ago

I have some results: The unfoldLexer function was to blame for the largest memory usage (in the fastMode for haskell). I just added a pair of seqs:

unfoldLexer :: ((state, input) -> Maybe (token, (state, input)))
            -> (state, input) -> [(state, token)]
unfoldLexer f b = fst b `seq` snd b `seq` case f b of
             Nothing -> []
             Just (t, b') -> (fst b, t) : unfoldLexer f b'

This removes part of the space leak.

Edit: I guess just

unfoldLexer f b = fst b `seq` case f b of

is enough.

noughtmare commented 7 years ago

Here are the graphs.

fastMode:

cleverMode:

Fuuzetsu commented 7 years ago

What's your test.hs?

noughtmare commented 7 years ago

same as https://github.com/yi-editor/yi/issues/105#issuecomment-269120677. A 618 KB haskell file with just the Data.Data module (first thing I found) pasted into it many times.

ghost commented 7 years ago

@Fuuzetsu This pattern should work:


main :: IO ()
main = do
    putStrLn "Hèllo"
    putStrLn "Hèllo"
    putStrLn "Hèllo"
    putStrLn "Hèllo"
    putStrLn "Hèllo"
    putStrLn "Hèllo"
    putStrLn "Hèllo"
    putStrLn "Hèllo"

going on till infinity.... (95k lines in my case)

Fuuzetsu commented 7 years ago

OK, I'm going to have some fun I think; do you have any more changes except the above strictness change?

noughtmare commented 7 years ago

No, that's it.

noughtmare commented 7 years ago

I get even better space usage when adding strictness in Yi.IncrementalParse.scanner.updateState0:

updateState0 curState toks@((st,tok):rest) = nextState `seq` ((st, curState), result) : updateState0 nextState rest

fastMode:

cleverMode:

Fuuzetsu commented 7 years ago

Both the strictness annotations (use BangPatterns instead of seq) while potentially useful (in unfoldLexer the tuple constructor can be forced without bad effect at least I believe) miss the main issue. You should ask yourself "why am I seeing 200-300MB when I open the file"?

It can be quickly answered by looking into the incremental parsing paper (see README) then starting Yi on big file, editing first line then checking the heap. You'll find that memory usage does not go down when you compare this to going to the end of the file, back and editing the first line. Notably, we're not parsing incrementally currently. The whole point of the incremental parser is that we only parse up to the point we display on screen. If we change something then everything past the point of change to the point of current view should be parsed but nothing more. I recommend retainer profiling with -R20 if you want to see what needs fixing for yourself. I have found what I think is the main issue and checking potential fixes.

Fuuzetsu commented 7 years ago

I suppose a picture speaks thousand words so

1482886543

This is loading nearly 1MB file with clever Haskell, down from 300-350MB to ~5MB: just opening file without moving anywhere. I still see some other problems though so not submitting anything yet.

There is of course a lot of value of looking for improving worst case (i.e. what you're seeing above) so I encourage you to pursue it but just be aware of what you're measuring.

EDIT: For reference, here is the real problem we're looking to fix:

1482887113

The above is from going to the end of the file (so as expected we parse everything and have huge spike) but then we return to beginning and type some stuff in: the memory stays way too high even after the parse should have been discarded (AFAICT).

ghost commented 7 years ago

Interesting. Wasn't aware of retainer profiling. Makes it so much easier.

Fixing the memory issue in Vty refresh function (Vty.hs#L212) with that becomes fairly trivial:

        (miniImages, bigImages) = let f = fmap (picture . snd) in bimap f f
                                $ partition (isMini . fst) (toList windowsAndImages)

EDIT: nevermind, the size of the list is too small to matter.

@Fuuzetsu are you considering Pango as well?

Fuuzetsu commented 7 years ago

@siddhanathan There is a much worse problem than lack of sharing there. The sizes used for rendering text and obtaining strokes is completely wrong: it's using Rect size which is a pixel size to specify number of characters; so on 1080p screen, we're saying "render next 2million characters". This was responsible for whole file being parsed as soon as you opened. The fix there is simple (and can be improved) and gets rid of indexed stream usage. I inline some of it here though I'll likely send a PR some time.

@@ -66,6 +66,7 @@ import           Yi.Config
 import           Yi.Debug                       (logError, logPutStrLn)
 import           Yi.Editor
 import           Yi.Event                       (Event)
+import qualified Yi.Rope as R
 import           Yi.Style
 import           Yi.Types                       (YiConfigVariable)
 import qualified Yi.UI.Common                   as Common
@@ -74,7 +75,7 @@ import           Yi.Layout                      (HasNeighborWest)
 import           Yi.UI.TabBar                   (TabDescr (TabDescr), tabBarDescr)
 import           Yi.UI.Utils                    (arrangeItems, attributesPictureAndSelB)
 import           Yi.Frontend.Vty.Conversions          (colorToAttr, fromVtyEvent)
-import           Yi.Window                      (Window (bufkey, isMini, wkey))
+import           Yi.Window                      (Window (bufkey, isMini, wkey, width, height))

 data Rendered = Rendered
@@ -240,10 +241,12 @@ refresh fs e = do
         { Vty.picCursor = cursorPos }

 renderWindow :: UIConfig -> Editor -> SL.Rect -> HasNeighborWest -> (Window, Bool) -> Rendered
-renderWindow cfg e (SL.Rect x y w h) nb (win, focused) =
+renderWindow cfg e (SL.Rect x y _ _) nb (win, focused) =
     Rendered (Vty.translate x y $ if nb then vertSep Vty.<|> pict else pict)
              (fmap (\(i, j) -> (i + y, j + x')) cur)
     where
+        w = Yi.Window.width win
+        h = Yi.Window.height win
         x' = x + if nb then 1 else 0
         w' = w - if nb then 1 else 0
         b = findBufferWith (bufkey win) e
@@ -257,21 +260,25 @@ renderWindow cfg e (SL.Rect x y w h) nb (win, focused) =
         wsty = attributesToAttr ground Vty.defAttr
         eofsty = appEndo (eofStyle sty) ground
         (point, _) = runBuffer win b pointB
-        region = mkSizeRegion fromMarkPoint $ Size (w' * h')
+        (text, _) = runBuffer win b $
+          -- Take the window worth of lines; we now know exactly how
+          -- much text to render, parse and stroke.
+          fst . R.splitAtLine h' <$> streamB Forward fromMarkPoint
+
+        region = mkSizeRegion fromMarkPoint . Size $! R.length text
         -- Work around a problem with the mini window never displaying it's contents due to a
         -- fromMark that is always equal to the end of the buffer contents.
         (Just (MarkSet fromM _ _), _) = runBuffer win b (getMarks win)
         fromMarkPoint = if notMini
                         then fst $ runBuffer win b $ use $ markPointA fromM
                         else Point 0
-        (text, _) = runBuffer win b (indexedStreamB Forward fromMarkPoint)

         (attributes, _) = runBuffer win b $ attributesPictureAndSelB sty (currentRegex e) region
         -- TODO: I suspect that this costs quite a lot of CPU in the "dry run" which determines the window size;
         -- In that case, since attributes are also useless there, it might help to replace the call by a dummy value.
         -- This is also approximately valid of the call to "indexedAnnotatedStreamB".
         colors = map (fmap (($ Vty.defAttr) . attributesToAttr)) attributes
-        bufData =  paintChars Vty.defAttr colors text
+        bufData =  paintChars Vty.defAttr colors $! zip [fromMarkPoint..] (R.toString text)

However it doesn't fix the problem fully: if we go back to the start of file and change it, the memory does not go back to what it was when we initially opened the file. Something is holding onto parser state and that's what I'm trying to find now.

Lastly, no, I'm not looking at Pango right this second; I want to finally remove all the gtk2 code and migrate it to gtk3 before doing anything else there. It also obfuscates profiling information because it's much more resource intensive than Vty so I'm working with that right now.

Edit: having said that, it is nearly 5am here so feel free to go at it…

ghost commented 7 years ago

The profiler seems to be pointing towards updateCache in the highlighter: https://github.com/yi-editor/yi/blob/51855c9971c1e611b2c823de2ac4bd068f6dfb78/yi-core/src/Yi/Syntax/Driver.hs#L38

Becomes clear when we add -L100 to the RTS options.

ghost commented 7 years ago

Reached the same conclusion @Fuuzetsu reached a while ago. Something is holding on to the Parser state, but not using it.

bio

retainers

ghost commented 7 years ago

@noughtmare Instead of forcing values in unfoldLexer, we could make them lazier.

diff --git a/yi-language/src/Yi/Lexer/Alex.hs b/yi-language/src/Yi/Lexer/Alex.hs
index ccd15471..f1ba7be6 100644
--- a/yi-language/src/Yi/Lexer/Alex.hs
+++ b/yi-language/src/Yi/Lexer/Alex.hs
@@ -212,9 +212,9 @@ lexScanner Lexer {..} src = Scanner
 -- | unfold lexer into a function that returns a stream of (state, token)
 unfoldLexer :: ((state, input) -> Maybe (token, (state, input)))
             -> (state, input) -> [(state, token)]
-unfoldLexer f b = case f b of
+unfoldLexer f ~(s, i) = case f (s, i) of
              Nothing -> []
-             Just (t, b') -> (fst b, t) : unfoldLexer f b'
+             Just ~(t, ~b') -> (s, t) : unfoldLexer f b'

 -- * Lenses
 makeLensesWithSuffix "A" ''Posn

This has the same effect, and reduces memory consumption in fastmode by over 50% in my tests. Using strictness annotations here would unnecessarily force the incremental parser.

noughtmare commented 7 years ago

@siddhanathan can you explain what ~b' does in

Just ~(t, ~b') -> (s, t) : unfoldLexer f b'

All the examples I found use the tilde for pattern matching lists or tuples.

ghost commented 7 years ago

@noughtmare I don't think that does anything. I just put it there for emphasis. I know b' is a tuple and the function call will evaluate it lazily.

Fuuzetsu commented 7 years ago

The strictness change that helps there is not due to unforced values at all, it's the tuple constructor that's not being matched on and we just build up thunks. Just matching on that our using bang on 'b' should have same effect.

ghost commented 7 years ago

There is one small change I have that reduces memory consumption further. It changes the syntax driver to use non-incremental parsing (which is not an issue, since syntax highlighting happens in one shot anyway).

Fuuzetsu commented 7 years ago

It is an issue because it relies on a bug. Highlighting requests strokes for specific region and that region is the visible area of window (with patch I posted). Gaining memory improvement by assuming full buffer parsing is wrong

noughtmare commented 7 years ago

Also interesting is what happens when you just open the file, wait some time (I kept moving the cursor up and down on the same spot because I think the profiler doesn't count time when it's waiting on input) and then close it again: yi-arr-words ~~This is using the fastMode.~~

~~As you can see. The ARR_WORDS (this is the internal constructor of the ByteString and Text datatypes) data constructor takes up more space than I expected.~~

~~Is it normal for a YiString to take up \~18x the amount of memory of the raw file?~~

~~I also tested this with no mode enabled and it gave a very similar picture (around 11MB total usage).~~

This is caused by #969. The actual difference is 2-3x. Which corresponds with my comment below.

noughtmare commented 7 years ago

I think I've succeeded in forcing the YiString to be kept in memory, here's the heap: rope-test Code:

import Prelude hiding (readFile, putStrLn)
import Yi.Rope (toText, readFile, cons)
import Data.Text.IO (putStrLn)
import Control.Monad (unless)

main :: IO ()
main = do
  rope <- readFile "test.hs"
  case rope of
    Right (r,_) -> do
      putStrLn (toText r)
      wait 100000000
      putStrLn (toText ('a' `cons` r))
    Left _ -> error "left"
  where wait n = unless (n == 0) (wait (n - 1))

From this we can see that the rope uses ~2.5x more memory.

ghost commented 7 years ago

IIRC the YiString is fully evaluated when opening the file, so that memory usage shouldn't go any higher. 18x doesn't sound too bad to me.

noughtmare commented 7 years ago

~~@siddhanathan As you can see from my previous comment, the memory usage of YiString is only 2.5x so I think 18x can be improved.~~ nvm.

ghost commented 7 years ago

For what it's worth, here's a change that eliminates caching:

diff --git a/yi-core/src/Yi/Syntax/Driver.hs b/yi-core/src/Yi/Syntax/Driver.hs
index f62a4b7e..ca540423 100644
--- a/yi-core/src/Yi/Syntax/Driver.hs
+++ b/yi-core/src/Yi/Syntax/Driver.hs
@@ -31,19 +31,12 @@ mkHighlighter scanner =
         , hlGetTree      = \(Cache _ _ _ focused) w -> M.findWithDefault emptyResult w focused
         , hlFocus        = focus
         }
-    where startState :: state
-          startState = scanInit    (scanner emptyFileScan)
+    where 
           emptyResult = scanEmpty (scanner emptyFileScan)
           updateCache :: Scanner Point Char -> Point -> Cache state tree tt -> Cache state tree tt
-          updateCache newFileScan dirtyOffset (Cache path cachedStates oldResult _) = Cache path newCachedStates newResult M.empty
+          updateCache newFileScan dirtyOffset (Cache path _states oldResult _) = Cache path _states newResult M.empty
             where newScan = scanner newFileScan
-                  reused :: [state]
-                  reused = takeWhile ((< dirtyOffset) . scanLooked (scanner newFileScan)) cachedStates
-                  resumeState :: state
-                  resumeState = if null reused then startState else last reused
-
-                  newCachedStates = reused ++ fmap fst recomputed
-                  recomputed = scanRun newScan resumeState
+                  recomputed = scanRun newScan (scanInit newScan)
                   newResult :: tree (Tok tt)
                   newResult = if null recomputed then oldResult else snd $ head recomputed
           focus r (Cache path states root _focused) =

This fixes the memory problem.

noughtmare commented 7 years ago

Ah, now I know why I didn't see any memory improvements when I tried to disable caching. We have two mkHighlighter functions one in Yi.Syntax.Driver and one in Yi.Syntax. Only the javascript mode uses that other mkHighlighter, is there any reason to have two of these functions?

noughtmare commented 7 years ago

I made some graphs with just the change @siddhanathan proposed to disable caching. I didn't use any other modifications. To generate these I first opened the file moved to the bottom and then moved the cursor up and deleted one character then moved the cursor around some more and reinserted the character and then moved around some more and then closed the editor.

Precise mode: yi-no-cache

cleverMode: yi-no-cache-clever

fastMode: yi-no-cache-fast

noughtmare commented 7 years ago

The retainer profile of the cleverMode: yi-no-cache-clever-retention

noughtmare commented 7 years ago

Making apply strict:

diff --git a/yi-core/src/Parser/Incremental.hs b/yi-core/src/Parser/Incremental.hs
index 01961935..b97881a6 100644
--- a/yi-core/src/Parser/Incremental.hs
+++ b/yi-core/src/Parser/Incremental.hs
@@ -149,7 +149,7 @@ instance Show (RPolish i o) where

 apply :: forall t t1 a. ((t -> a) :< (t :< t1)) -> a :< t1
-apply ~(f:< ~(a:<r)) = f a :< r
+apply (f:<(a:<r)) = f a `seq` f a :< r

Results in (cleverMode): yi-no-cache-clever-retention-apply

~~This makes it seem like flushAccumulatorE is the problem.~~

Here is the retainer profile when navigating the file with single key presses (otherwise the accumulator is interfering): yi-no-cache-clever-retention-apply

ghost commented 7 years ago

I'm curious about the performance impact of the patch I posted above. And maybe evaluating other ways of having a cache that doesn't keep the incremental parser state around unnecessarily. That patch simply makes syntax highlighting non-incremental, which might not be what we want.

Making apply strict

Experimenting is fine. But there's a very good reason that application is lazy. Rather than blaming the incremental parser, try blaming the entity using the incremental parser incorrectly. In this case, what is doing the call to apply, and is that call retaining memory unnecessarily (checked via a biography profile).

noughtmare commented 7 years ago

yi: internal error: Invalid object in processHeapClosureForDead(): 131184
    (GHC version 8.0.1 for x86_64_unknown_linux)
    Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug
[1]    11407 abort (core dumped)  yi -fvty -kvim test.hs +RTS -hbdrag,void -hc

😞

ghost commented 7 years ago

@noughtmare Leave a comment here https://ghc.haskell.org/trac/ghc/ticket/7836 . If you have reproducible steps, then mention them in the bug.

ghost commented 7 years ago

I'm guessing some part of syntax highlighting is holding on to the lexer state.

EDIT: Nevermind. That memory is actually used.

Clever mode:

alex

Fast mode:

fastmode

noughtmare commented 7 years ago

More results!

diff --git a/yi-keymap-vim/src/Yi/Keymap/Vim/StateUtils.hs b/yi-keymap-vim/src/Yi/Keymap/Vim/StateUtils.hs
index a694c1f6..b288e2c1 100644
--- a/yi-keymap-vim/src/Yi/Keymap/Vim/StateUtils.hs
+++ b/yi-keymap-vim/src/Yi/Keymap/Vim/StateUtils.hs
@@ -91,7 +91,7 @@ flushAccumulatorE :: EditorM ()
 flushAccumulatorE = do
     accum <- vsAccumulator <$> getEditorDyn
     let repeatableAction = stringToRepeatableAction accum
-    modifyStateE $ \s ->
+    accum `seq` modifyStateE $ \s ->
         s { vsRepeatableAction = Just repeatableAction
           , vsAccumulator = mempty
           , vsCurrentMacroRecording = fmap (fmap (<> accum))
diff --git a/yi-language/src/Yi/Lexer/Alex.hs b/yi-language/src/Yi/Lexer/Alex.hs
index ccd15471..a9a273e7 100644
--- a/yi-language/src/Yi/Lexer/Alex.hs
+++ b/yi-language/src/Yi/Lexer/Alex.hs
@@ -212,7 +212,7 @@ lexScanner Lexer {..} src = Scanner
 -- | unfold lexer into a function that returns a stream of (state, token)
 unfoldLexer :: ((state, input) -> Maybe (token, (state, input)))
             -> (state, input) -> [(state, token)]
-unfoldLexer f b = case f b of
+unfoldLexer f b = fst b `seq` case f b of
              Nothing -> []
              Just (t, b') -> (fst b, t) : unfoldLexer f b'

before: yi-before

after: yi-after

I've gotten rid of the memory leak 😃!

noughtmare commented 7 years ago

One more thing. My routine for this test was first open a big file (same as above), then press 'G' (to force the highlighter) then 'vggd' (delete the entire file) then 'i' and entering some random text, then 'upG' (undoing, pasting in the original file and scrolling down) and then some cursor movement and finally ':q!' (to quit).

As you can see in the after picture there is a moment (around the 2 sec mark) where the memory is constant. I think this is the time when I entered the random text. It should be possible to already release the memory there, but I don't know how.

yi-editor / yi

Incremental parser leaks space? #105