phetsims / reactants-products-and-leftovers

"Reactants, Products and Leftovers" is an educational simulation in HTML5, by PhET Interactive Simulations.
GNU General Public License v3.0
1 stars 4 forks source link

memory leak #18

Closed pixelzoom closed 9 years ago

pixelzoom commented 9 years ago

In #17, @ycarpenterphet reported:

load time seems to increase as the game is played for a longer period of time.

By "load time", she was referring to the time required to load the next challenge when pressing the "Next" button.

I'll take another pass through the code to look for memory leaks. (Note to self: It sure would be nice if AXON had some support for summarizing Properties and how many observers they have.)

ycarpenterphet commented 9 years ago

I double-checked the screencapture recording from my student interviews today.

As an example, load time for the first question the user encountered in the Game (which happened to be in level 3 for this student, after about 5 total minutes of sim use) was 1 sec (or less).

After about 30 minutes of total sim use (including several levels of game play, as well as use of the previous screens), the load time for each level 3 question in the game was 8 to 10 seconds.

pixelzoom commented 9 years ago

Just to make sure that the slow-down is definitely attributable to RPAL... Was RPAL the only window open in Safari?

ycarpenterphet commented 9 years ago

Yes, RPAL was the only open window in Safari, and if you reload RPAL in the safari browser window, the load time resets to its initial (reasonable) value.

ycarpenterphet commented 9 years ago

This problem also is not just restricted to iPad. A student interviewed today using RPAL dev13 on a macbook pro running OSX 10.9.5 and Safari 7.1, and increased load time was noticeable.

Load time at the start of the game < 1 second Load time after ~ 20 minutes of sim use ~ 4 seconds

Load time for Level 3 (ie. loading the first challenge by clicking Level 3 on the level selection screen), after 30 minutes of total sim use time ~ 9 seconds.

No other Safari windows or tabs were open.

pixelzoom commented 9 years ago

Thanks for the update. Memory leaks are typically not platform-specific.

pixelzoom commented 9 years ago

Things to check:

pixelzoom commented 9 years ago

Looks like switching between sandwiches/reactions in the first 2 screens also slows down after repeated use.

pixelzoom commented 9 years ago

7b051bbe3d969e18d1bcd3d8507773d9a6c4b239 use DynamicIcon only for products in SandwichesEquationNode, to reduce axon linkage

pixelzoom commented 9 years ago

First thing I'm doing is eliminating a few internal uses of AXON. (See commits above.) They were there for convenience, but can be done in other ways that can't introduce memory leaks. This will reduce the number of AXON linkages that I need to inspect.

pixelzoom commented 9 years ago

I found a memory leak in ChallengeNode (not unlinking from the state machine that controls the game), so that could be responsible for slowdown in the Game screen. But I haven't identified what is slowing down the Sandwiches and Molecules screens.

pixelzoom commented 9 years ago

A memory leak has been identified and fixed in the AccordionBox common-code component. See phetsims/sun#142. This was likely responsible for the slowdown in 'Sandwiches' and 'Molecules' screens, since the boxes and everything in them was not being released when switching reactions.

pixelzoom commented 9 years ago

Tethering the iPad slows things down so much that I can't get a feel for whether the memory leak is fixed. So I've published 1.0.0-dev.16 for testing. http://www.colorado.edu/physics/phet/dev/html/reactants-products-and-leftovers/1.0.0-dev.16/reactants-products-and-leftovers_en.html

pixelzoom commented 9 years ago

Ugh. There's still a slowdown in all 3 screens.

pixelzoom commented 9 years ago

I added this to the end of SandwichesScreenView constructor. It switches between the 3 sandwich equations 1500 times (500x3) at startup:

var iterations = 0;
for ( var i = 0; i < 500; i++ ) {
  for ( var j = 0; j < model.reactions.length; j++ ) {
    model.reaction = model.reactions[j];
    console.log( ++iterations );
  }
}

Each time that model.reaction changes, big chunks of the view are changed, and lots of AXON wiring is changed.

On Mac+Chrome, this started by printing stuff to the console at a decent rate, started to slow down ~900 iterations, really slowed down (and started to GC?) around 1300 iterations, and when I finally killed it (at 1430 iterations), it was taking ~10 seconds to complete 1 iteration of the loop.

I lowered the number of iterations to 1350 (450x3), so that the sim would actually come up. Then tried to switch between sandwich "equations". As expected, switching between the equations was slow, and inconsistently so (similar to pauses due to GC.)

And then for something very weird... Reloading the window didn't seem to release resources. As the sim was starting up, it bogged down horrible around the 150th iteration. Only restarting Chrome seemed to correct things.

pixelzoom commented 9 years ago

I was not previously setting disposed objects to null, so I did so. This exposed cases in QuantitiesNode and ChallengeNode where objects are being used after having been disposed. A likely source of memory leak...

pixelzoom commented 9 years ago

Fixed the calls to disposed objects. Also added 'memoryDebug' query parameter that prints output to the console when top-level reaction-related objects are instantiated, disposed and (in the case of challenges) undergo a state change.

pixelzoom commented 9 years ago

@samreid and I spent > 1 hour exploring this in Chrome Dev Tool via Skype (Sam driving). We added a step function to SandwichesView, see below. Then we ran the sim with 'Sandwiches' screen only.

Timeline 'memory' view seemed to indicate that memory was increasing over time, with some GC going on. CPU profiling didn't show any huge increase in time spent in each call to animationFrame. We tried removing the equation from ReactionBarNode, and almost everything from QuantitiesNode, and still (maybe) saw some memory creep. Switching between sandwiches on Chrome kept chugging along, but the same thing on iOS Safari slowed down quickly.

Not sure what we learned exactly... Maybe some evidence that there's a memory leak, but no hints about where it's occurring.

function SandwichesView( model, ... ) {
   this.model = model;
   ....
}

return inherit( RPALScreenView, SandwichesView, {
    reaction: 0,
    count: 0,
    step: function() {
      this.count++;
      if ( this.count % 20 == 0 ) {
        this.model.reaction = this.model.reactions[this.reaction % this.model.reactions.length];
        this.reaction++;
      }
    }
  } );
pixelzoom commented 9 years ago

Here's a version of SandwichesView.step that's equivalent but a little cleaner:

return inherit( RPALScreenView, SandwichesView, {
    steps: 0,
    reactionIndex: 0,
    // cycle through the sandwiches
    step: function( dt ) {
      this.steps++;
      if ( this.steps % 20 == 0 ) {
        this.model.reaction = this.model.reactions[this.reactionIndex++];
        if ( this.reactionIndex >= this.model.reactions.length ) {
          this.reactionIndex = 0;
        }
      }
    }
  } );
pixelzoom commented 9 years ago

Hmmm... I'm seeing a definite memory increase in Chrome Timeline. On startup, after cycling through the sandwiches 10 time, Timeline snapshot showed min=24MB, max=70MB. After about 5 minutes, Timeline snapshot for 10 cycles showed min=1.1GB, max=1.2GB. So a definite memory increase over time. Pressing the 'GC' button doesn't lower memory. So something must be retaining memory.

pixelzoom commented 9 years ago

Memory leak in sun.AccordionBox fixed in https://github.com/phetsims/sun/issues/144.

pixelzoom commented 9 years ago

Collaborated with @jonathanolson and memory leak appears to be resolved. After running the "sandwich iteration" test for 20 minutes, Timeline memory profile stayed in the 25-80 MB range.

Summary of the problem:

Substances are participants in a reaction, and they are model elements. In the "Sandwiches" and "Molecules" screens, they are created at startup, and persist for the lifetime of the sim. In the "Game" screen, they are created when we start playing a game, and persist until we complete the game.

Each substance has an associated "icon" (a scenery Node) that is used to visually represent that substance. Because scenery is a DAG, that icon can appear multiple times in the scenegraph. We wrap it in another node (the "wrapper"), so that (a) an icon doesn't end up being a sibling of itself (which is not allowed), and (b) we don't try to transform it (which would affect it everywhere it appears).

The memory leak occurred because scenery children keep a reference to their parent, and that reference is removed only when the child is explicitly removed form its parent. When the wrapper was removed from the scenegraph, we weren't explicitly removing its child icon. Substances persisted in the model, and their icons retained references to all of their parents, effectively preventing many many unneeded Nodes from being garbage collected.

All substance icons are now wrapped in SubstanceNode (formerly DynamicIcon) whose dispose function explicitly removes the substance's icon from the scenegraph, severing the parent-child relationship.

pixelzoom commented 9 years ago

Problem not entirely solved... It's working nicely on Chrome and Firefox (Mac) now. But Safari (iOS and Mac) both slow down soon after starting the "sandwich iteration" test. Safari iOS is especially base, slows down to ~5 seconds per iteration after only a couple of minutes. @samreid mentioned earlier today that he was seeing different behavior on Safari.

samreid commented 9 years ago

I commented on Skype:

I’m still seeing slowdown in rpal—chrome memory snapshot diff reveals thousands of SVGStopElement not getting garbage collected. Any ideas? @jonathanolson around to take a look? Also, running the RPALQueryParameters.LEAK_TEST with this.steps % 8 not showing significant slowdown after several minutes on OSX/Safari when running under scenery ohtwo.

pixelzoom commented 9 years ago

@samreid said:

I’m still seeing slowdown in rpal—chrome ...

Ah, I do see a slowdown after about 10 minutes (MacPro) with this.steps%8. Timeline memory profile seems to stay in the 25-80 MB range, and pressing the GC button drops the memory back down to 25MB.

samreid commented 9 years ago

Sorry, I got parsed wrong (my fault, not yours). I'll reformat:

I’m still seeing slowdown in rpal (on OSX/Safari) Chrome memory snapshot diff reveals thousands of SVGStopElement not getting garbage collected

pixelzoom commented 9 years ago

I changed 'leakTest' to 'leakStep', integer that specifies number of animation steps to wait before automatically switching sandwich equations. For example, '?leakStep=8' will switch sandwiches every 8th step call. Setting to 0 (the default) disables this feature.

pixelzoom commented 9 years ago

@samreid said:

Chrome memory snapshot diff reveals thousands of SVGStopElement not getting garbage collected

Can you elaborate on how you identified this?

pixelzoom commented 9 years ago

Notes from 12/16/14 developer Skype call:

@samreid is comparing heap snapshots in Chrome Dev Tools to identify the SVGStopElement leak.

SVGStopElement is a color stop that is related to a linear gradient. Scenery has the ability to cache linear gradients, and sun buttons are using that feature. Gradients are supposed to be removed from the cache when their associated nodes are removed from the scenegraph, but it looks like the cache leaks. So the ArrowButtons that are part of RPAL's spinners are leaking gradients. (Confirmed by switching to flatAppearanceStrategy, which uses no gradients.) Created https://github.com/phetsims/scenery/issues/328 to fix this.

pixelzoom commented 9 years ago

If the memory leak in Scenery gradients can't be fixed, the workaround will be to use the alternate "flat" look for the arrow buttons in spinners. The other 2 buttons in this sim are the Reset All button (which persists for the lifetime of the sim, so leak is not an issue) and the Expand/Collapse button on accordion boxes (which doesn't use 'sun', so doesn't use gradient caching).

pixelzoom commented 9 years ago

I temporarily switched the spinner arrow buttons to use RectangularButtonView.flatAppearanceStrategy,. I then ran the sim on iPad2 with ?screens=1&leakStep=8 for 1 hour. There was no perceptible slowdown in the "Sandwiches" screen, only an occasional pause, presumably for GC. So if we can work of the gradient leak problem, I think we're in good shape (for the "Sandwich" screen, anyway). If not, I'll verify with @ycarpenterphet that it's OK to work around the problem by using 'flat' look for the arrow buttons.

pixelzoom commented 9 years ago

I ran the same test on iPad2 with the "Molecules" screen (?screens=2&leakStep=8). It slowed down within 30 seconds. After 20 minutes, it was taking 8 seconds to switch reactions. So there's a different problem in the "Molecules" screen.

There are very few differences between these 2 screens. The differences are:

(1) In "Sandwiches", the icons are scenery.Images. In "Molecules", they are NITROGLYCERIN molecules, which are compositions of scenery-phet.ShadedSphere (scenery.Cirlce with radial gradient).

(2) In the "Molecules" screen, chemical symbols are displayed using scenery-phet.SubSupText.

pixelzoom commented 9 years ago

Comparing heap snapshots for the "Molecules" screen shows large numbers of SVGRadialGradientElement and SVGStopElement (gradient color stops), with "# Deleted" zero for both. So it looks like Scenery's radial gradients are also leaking. Tracking in https://github.com/phetsims/scenery/issues/328

pixelzoom commented 9 years ago

Note that a leak in radial gradients will also affect the "Game" screen, since it also displays NITROGLYCERIN molecules.

pixelzoom commented 9 years ago

A leak in radial gradients also affects the "Balancing Chemical Equations" sim, which dynamically adds and removes NITROGLYCERIN molecule nodes from the scene graph.

ycarpenterphet commented 9 years ago

It sounds like the gradient vs. flat buttons is not the only issue - but let me know if you need feedback on the flat appearance for buttons (and if so, please attach either a dev link or a screenshot so I can view the difference). Thanks!

pixelzoom commented 9 years ago

To confirm that the leak in "Molecules" screen is due to scenery.RadialGradient, I removed the RadialGradient by changing this line in scenery-phet.ShadedSphereLine:

29 options.fill = 'red';

This makes all of the atoms 'red'. Re-running the test (?screens=2&leakStep=8), there is no noticeable slowdown on iPad2 after 10 minutes. (With RadialGradient, the sim started to slow down after 30 seconds.) There are also no SVGRadialGradientElement or SVGStopElement objects in the heap comparison.

Unlike the "Sandwiches" screen, there is no easy workaround for this. RadialGradient is essential to the look of atoms. I suppose we could create 1 of each molecule and convert them to images (living with the leakage for that one set of molecules). But since this also impacts Balancing Chemical Equations, it seems like this should be fixed in Scenery 0.1.

pixelzoom commented 9 years ago

@jonathanolson said:

See phetsims/scenery#328 (comment) for information about the fix applied to Scenery.

Right now, it just looks like ArrowButton has two fillKept nodes with linear gradients that is leaking. If you want to test with fillKept off, replace the setFillKept() setter in Fillable.js (Scenery) with

this._fillKept = false;

pixelzoom commented 9 years ago

Thanks for fixing scenery. Note that setting fillKept=false for ArrowButton does not totally resolve the issue. See https://github.com/phetsims/sun/issues/146. The buttonAppearanceStrategy for all sun buttons creates something called overlayForHorizGradient, whose fillKept=true, and no reference is kept to it in the button. So this will continue to leak after fillKept is set to false for the button.

pixelzoom commented 9 years ago

I fixed sun buttons (see https://github.com/phetsims/sun/issues/146) and did heap-snapshot comparisons in Chrome. I see no memory leaks in any of the screens.

pixelzoom commented 9 years ago

I removed the leakStep query parameter and related code.

All memory leaks addressed, closing.