performance of switching between reactions

pixelzoom commented 9 years ago

@ycarpenterphet reported this via email for 1.0.0-dev.13:

When playing the game (iPad2, iOS7), loading each subsequent question (by pressing Next) seems to take a very long time, and that lag/load time seems to increase as the game is played for a longer period of time.

The lag is minimal on chrome/OSX 10.9, but it’s possibly increasing in length as well (difficult to tell with the shorter lag time), so I wonder if this is a memory issue?

Regardless, the substantial time lag on iPad is very noticeable and problematic - could you look into this?

pixelzoom commented 9 years ago

The increasing load time (i.e., potential memory leak) will be tracked separately in #18. This issue will deal with the nominal loading time (load time when you first start playing the Game.)

pixelzoom commented 9 years ago

@ycarpenterphet could you please quantity what "a very long time" is? And what would be acceptable?

On my iPad2 running iOS 8.1.1, after pressing the 'Next' button, it takes about 1 second to load the next challenge. This is just slightly longer than switching between sandwiches or reactions on the first 2 screens. The time lag is due to the large number of scene graph changes required - switching challenges requires changing everything in the user-interface except the scoreboard, 2 rectangles (the boxes), and the braces. And I don't see any way around that.

ycarpenterphet commented 9 years ago

I think a load time of 1 second is fine, and this seems to be (roughly) the performance that I see in iOS 7.0.4 , at the start of game usage. I think the problem is actually #18, where the loading time is increasing with prolonged game play.

I consider "a very long time" as the 8-10 s load time experienced per challenge mentioned in #18.

pixelzoom commented 9 years ago

Agreed that #18 is a more important issue.

I'll see what I can do to decrease the "next challenge" time. But it's unlikely to decrease by more than a couple of tenths of a second.

pixelzoom commented 9 years ago

Note to self...

Investigate creating these things on demand:

"Try Again", "Show Answer", "Next" buttons
face
NumberNodes that correspond to spinners (2 min, 4 max)

This will spread out the cost of adding to the scenegraph a tiny bit. This would be 6-8 fewer nodes (depending on challenge) that are created when "Next" button is pressed. That's such a small fraction of the total nodes created that I'd be surprised if it improves performance. And it will certainly complicate the code.

pixelzoom commented 9 years ago

Creating game buttons on demand speeds things up by ~10%. On Chrome, a typical challenge load time was ~200ms, and this saves ~20ms.

pixelzoom commented 9 years ago

Creating the face on demand speeds things up by a tiny bit, something <5%. Every bit counts I guess, so I've committed this change.

pixelzoom commented 9 years ago

Creating NumberNodes on demand would require some very invasive changes. So I started by creating them when ChallengeNode is created, but adding them to the scenegraph on demand. Looks like a very small performance increase, maybe 1%.

pixelzoom commented 9 years ago

Even though it's a bit of work, I'm going to investigate creating NumberNodes on demand. They are only needed in the game, where they replace the spinners in the 'Try Again', 'Show Answer' and 'Next' states. They are not needed at all in the first 2 screens, but are still created, so this also has the potential to increase performance slightly for the first 2 screens.

pixelzoom commented 9 years ago

NumberNodes (static numbers) are now being created on demand. Very negligible performance improvement (1%?), but this is the right thing to do, since this component is shared with the first 2 screens, where spinners are never replaced by static numbers. This eliminates creation of 2-3 unneeded scenery.Text nodes in the first 2 screens, not a big deal, but more "correct".

pixelzoom commented 9 years ago

Well, after all of those changes, challenges load is maybe 0.1 sec faster on iPad2. A lot of work for not much gain. The load time varies primarily based on the initial values of the quantities (how many molecules need to be displayed initially in the boxes) and secondarily on how many products the reaction has. On Chrome, load time varies from 100-230ms. On a tethered iPad, it varies from 700-1200ms (those numbers are probably a little high because tethering reduces overall performance.)

pixelzoom commented 9 years ago

There's really nothing that I can do to squeeze more performance out of this. Changing challenges inherently requires removing and adding a lot of nodes. Slight performance improvements are demonstrated in http://www.colorado.edu/physics/phet/dev/html/reactants-products-and-leftovers/1.0.0-dev.14/reactants-products-and-leftovers_en.html

@ycarpenterphet please review and instruct me on how to proceed. If this is sufficient, please close the issue. If it's not good enough, we'll need to discuss either design changes or ways of doing this outside of scenery (which will be significant redevelopment and difficult to maintain.)

pixelzoom commented 9 years ago

Note that #18 (memory leak) has not been addressed in 1.0.0-dev.14. So please evaluate the initial responsiveness of moving between challenges.

pixelzoom commented 9 years ago

1.0.0-dev.14 has a problem with the vertical positioning of the '?' in the game. Fix appears in 1.0.0-dev-15 at http://www.colorado.edu/physics/phet/dev/html/reactants-products-and-leftovers/1.0.0-dev.15/reactants-products-and-leftovers_en.html

ycarpenterphet commented 9 years ago

I tested loading challenges from Level 3 initially, since these are often high-numbers-of-molecules-to-draw challenges. I tested it out on iPad2/iOS7 and iPad3/iOS8.

The initial load time for a challenge is, although slower than ideal, not a dealbreaker. I would consider it critical if it were long enough that students were waiting long enough to attempt to press the button again to get it to load. Right now, it's right at the edge of that limit (I suspect some students will be impatient enough to press again, but most won't, on average).

So, basically, I think it's acceptable as-is, but I think it would be useful for us to have that conversation about other workarounds, just so that I'm aware of what you mean about 'design changes or other ways of doing this outside of scenery'.

pixelzoom commented 9 years ago

Design changes would mean coming up with a user-interface that doesn't display so many objects, and doesn't require changing so many things when a new reaction/challenge is displayed. I have no idea what such an interface would look like, or whether it would meet goals. But the idea is that the sim would be "dumbed down" to work with the performance constraints of scenery and the devices. (Not really an option, I'm guessing.)

Doing this outside of scenery would mean reimplementing pieces of the sim as either direct-to-Canvas or WebGL. Either of those approaches is currently less maintainable, and means throwing away lots of code. Stuff inside the boxes would be the first candidate for such a treatment. We could probably still use the common-code AccordionBox if we used Canvas, but probably not with WebGL - so we may lose some common-code support. This approach would push the sim release back 1-2 months, depending on how we decided to proceed. And the performance gain is probably something like 1/2 second.

That said... The performance of this sim may or may not improve when we switch over to Scenery 0.2. That's supposed to happen 1/1/2015. Perhaps @jonathanolson can comment on whether this sim will benefit from Scenery 0.2, and whether it's still on schedule.

pixelzoom commented 9 years ago

If you want me to investigate "outside of scenery" performance improvements, the lowest cost option would be to investigate rendering the molecules/sandwiches directly Canvas. I'd probably know in 3-4 hours if that was going to be significantly faster. It is certain to be more complicated and less maintainable. But we've used that approach successfully in other sims that show large collections of objects (Acid-Base Solutions, Beer's Law Lab, pH Scale,...)

pixelzoom commented 9 years ago

@ariel-phet said via email:

Just looked on iPad 3, and unfortunately the delay between challenges is a bit too slow. It is close to be acceptable but definitely falling on the side of "not up to snuff"... Definitely worth trying canvas.

samreid commented 9 years ago

Just a quick point. May be better to try to understand where the time is being spent before jumping in and trying canvas (unless you are already pretty sure that would solve it).

pixelzoom commented 9 years ago

Agreed. But I'm fairly certain that the time is spent changing the scene graph, because the model is trivial (1 Reaction), and there is very little axon wiring (connections to at most 6 "quantity" properties in the Reaction).

pixelzoom commented 9 years ago

@ycarpenterphet You assigned this back to me, but didn't comment on the options and repercussions that I outlined. So I'm inferring that:

(1) Design changes and waiting for Scenery 0.2 release are not an viable options. (Note that we have not heard a reply from @jonathanolson about Scenery 0.2.)

(2) You would like me to proceed with reimplementing portions of the sim to improve performance.

(3) I should investigate using Canvas as the first option, WebGL second.

(4) You are OK with not meeting the "November/December 2014" publication deadline, and the possibility of a 1-2 month publication delay is acceptable.

Let me know if any of this is incorrect.

ycarpenterphet commented 9 years ago

Sorry, let me clarify.

(1a) Design changes of that magnitude aren't an option. (1b) I talked to @ariel-phet about waiting to hear back on Scenery 0.2, and he suggested that canvas is worth trying regardless.

(2) yes - and as both @ariel-phet and I said, it's very close to acceptable performance, so I don't think we need a huge improvement to the initial load time.

(3) definitely

(4) this deadline isn't solely my call - if initial investigations with Canvas look promising (which you said earlier you said hoped would take 3-4 h to check if that's viable), I think it's worthwhile, but we may want to update @ariel-phet and @kathy-phet on the expected delay in publication.

pixelzoom commented 9 years ago

Notes from 12/16/14 developer Skype call:

CPU profiling shows 17% of time spent in Node.mutate. Created https://github.com/phetsims/scenery/issues/327 to investigate speeding up Node.mutate.

Another big chunk of time is being spent by Scenery creating the stroked shape for the brackets that appear below the Before/After boxes. Since this is a complicated path, I'll attempt to cache these. Their length won't be as custom-tailored to specified reactions, but they will probably look good enough. There are 5 different bracket lengths needed: 2 reactants, 1 product, 2 products, 2 leftovers with 1 product, 2 leftovers with 2 products.

Other paths that could potentially be cached: plus operators, equation arrow, arrow between boxes. These may have little affect on performance, wait until after brackets are cached.

pixelzoom commented 9 years ago

Tracking caching of brackets in http://github.com/phetsims/reactants-products-and-leftovers/issues/24.

pixelzoom commented 9 years ago

At 12/16/14 developer Skype call, @jonathanolson said that Scenery 0.2 is still on track for end-of-year release, is unlikely to offer any performance improvements for this specific case, but "you're welcome to try it for yourself". @samreid tried RPAL with Scenery 0.2, and it was hitting exceptions in Scenery 0.2, causing unacceptable visual glitches. So my opinion is that waiting for Scenery 0.2 to address performance is not an option for RPAL 1.0 release, and we should proceed with Scenery 0.1.

pixelzoom commented 9 years ago

I reimplemented HBracketNode using arc instead of quadraticCurveTo, hopefully to reduce the Scenery cost of creating the stroked shape. @jonathanolson said:

Arcs would presumably increase speed, they are easier to stroke

Doing a search through all PhET source code, I noticed that sun.AccordionBox is also using quadraticCurveTo where arc would provide the same result. I created https://github.com/phetsims/sun/issues/145 to track this optimization, which would benefit RPAL.

pixelzoom commented 9 years ago

The equations at the top of the "Sandwiches" and "Molecules" screens are relatively expensive to create, and we have a small number of reactions in these screens. So I implemented caching. The equations are created on demand, and added to the scene graph. Once added, visibility is used to show the selected equation.

Prior to making this change, switching sandwiches required ~95 ms (Chrome on MacPro). After this change, that dropped to ~65ms, a savings of ~31%. It feels marginally snappier on iPad2.

Note that because the equations are added on demand, the first time you select a reaction it takes longer than subsequent times. If that's not desirable, then all the equations can be created at startup. But on iPad, that will add 4-5 seconds to startup, which I suspect will be unacceptable.

pixelzoom commented 9 years ago

I also implemented caching of the Before/After boxes (and controls, brackets, etc below them). As with the equation, they are created on demand, then visibility is used to show the selected reaction. On Chrome+MacPro, switching between reactions dropped from ~65ms to ~1ms. There's essentially nothing happening in the scenegraph except visibility change. Very responsive on all platforms, very snappy on iPad.

Memory requirements are obviously larger with this caching. When starting the sim and exercising the first 2 screens, memory hovers around 50 MB, and that seems reasonable.

To facilitate debugging of gradient leaks (phetsims/scenery#328), this particular caching is disabled when running with the 'leakTest' query parameter.

pixelzoom commented 9 years ago

The first 2 screens are working great. Now I need to get creative about how to make the Game screen more responsive. The time spent switching between challenges is the problem. Here's a representative CPU profile (Chrome + MacPro) showing what happens when we switching challenges. In this example, it takes 86 ms.

screenshot_154

The big chunks (left to right) are as follow:

remove the previous ChallengeNode from the scene graph
instantiate the new ChallengeNode
add the new ChallengeNode to the scene graph
update the scene graph

The biggest chunk is instantiating the new ChallengeNode. And it's difficult to change how that's done without a serious amount of reimplementation. So let's see what's next biggest.

Next biggest is removing the previous ChallengeNode from the scene graph. Rather than removing it, perhaps we can just make it invisible, and defer removing the ChallengeNodes until the game ends. There are only 5 challenges per game, so the memory cost shouldn't be a big deal (unless we start GC'ing).

pixelzoom commented 9 years ago

Deferring removal of challenges make iPad feel a little snappier, but not quite snappy enough. And when the nodes need to be removed and cleaned up, it resulted in a long pause. I tried putting that pause before going to the "Results" and before going back to "Level Selection". Both felt pretty awful. So bailing on that idea.

pixelzoom commented 9 years ago

Note that pressing one of the level-selection buttons doesn't need to remove a previous ChallengeNode. But it needs to create the challenges in the model, and that looks like it takes about the same time as removing a ChallengeNode. So pressing a level-selection button is about as responsive as pressing a "Next" button.

pixelzoom commented 9 years ago

There are some components that could be reused between challenges, instantiated once in PlayNode and passed in to ChallengeNode constructor. They would need to be explicitly removed from the scene is ChallengeNode.dispose.

GameButtons & checkEnabledProperty Quation mark FaceNode HideBox (molecules & numbers)

... or perhaps make ChallengeNode mutable, able to change parts of itself when the reaction changes.

pixelzoom commented 9 years ago

@ycarpenterphet, @ariel-phet,

For the Game screen, I tried implementing the molecules in the Before/After boxes by drawing them directly to canvas. The performance gain was imperceptible (and in some cases, took longer in the CPU profile). On iOS+Safari, the molecules look a little fuzzy. (This took about 2 hours.) On Mac+Firefox, the stroked line around the molecule atoms looks crappy. (Compare the molecules in the boxes to the ones below the boxes.)

Based on this experiment, I'm reverting to Scenery.

You can evaluate the Canvas implementation in this dev version: http://www.colorado.edu/physics/phet/dev/html/reactants-products-and-leftovers/1.0.0-dev.19/reactants-products-and-leftovers_en.html

Compare to the previous version, while is entirely in Scenery: http://www.colorado.edu/physics/phet/dev/html/reactants-products-and-leftovers/1.0.0-dev.18/reactants-products-and-leftovers_en.html

pixelzoom commented 9 years ago

I also tried switching the Game screen to renderer:'canvas'. It was actually a little slower than 'svg' on Safari and Chrome (didn't try Firefox).

pixelzoom commented 9 years ago

@jonathanolson suggested "Lowering node count may help". There are a handful of intermediate scenery.Nodes (used for layout convenience) and scenery.LayoutBox. I removed these, handled my own layout in the case of removing LayoutBox, and saw no measurable performance increase. It also complicated the code, especially having to handle my own layout.

ycarpenterphet commented 9 years ago

@pixelzoom - I agree with your evaluation of dev19 and decision to revert back to Scenery (dev 18). There's no observable performance increase in use on iPad 2 or iPad 3 with Safari, and the fuzzy edged molecules look terrible.

pixelzoom commented 9 years ago

@samreid and I spent close to 2 hours looking at this today.

@jonathanolson recommended looking a layering, to make sure there wasn't some bad layer-splitting going on. The entire subtree under ChallengeNode is a single layer, so there are no layers being implicitly created. We lack to expertise to judge where explicit layers might be added, and @jonathanolson is apparently unavailable to help with this issue. So we bailed on optimizations related to scene structure, and started looking at other options.

In general, the options for performance gains are either (1) significantly reduce the number of nodes in ChallengeNode, and/or (2) hide the cost of removeChild and/or addChild by putting it somewhere that the user is less likely to notice it.

Options for reducing the number of nodes in ChallengeNode:

(1a) Investigate making ChallengeNode mutable, and have it reuse any nodes possible. I'm skeptical that this would result in a perceptible performance gain, as the number of reusable nodes is a small percentage of the total nodes in the ChallengeNode subtree.

(1b) Flatten molecules to 1 node. Temporarily replacing NITROCLYCERIN molecules nodes with scenery.Circle resulted in big performance gains, because the molecules are a relatively large percentage of the nodes in the ChallengeNode subtree. Pre-creating the molecules and flattening them to images might result in similar performance gains. But the image version of the molecules look lousy (fuzzy on iPad, jaggy and fuzzy on Chrome), similar to the Canvas experiment I tried above. Since @ycarpenterphet said that those molecules looked "terrible", I'm inclined not to pursue this option.

Options for hiding the cost of removeChild/addChild:

(2a) By deferring removeChild of a ChallengeNode until 2 animation steps after the "Next" button is pressed, we gained a measurable bit of performance, but it still feels sluggish on iPad2.

(2b) After the current ChallengeNode is made visible, we might be able to pre-create the next ChallengeNode on the next animation step. The problem with this is that ChallengeNode currently synchronizes itself with other things in the model, and that may cause problems. Investigate modifying ChallengeNode so that it could be pre-created.

pixelzoom commented 9 years ago

Half a step forward, 1 step backwards... To improve performance of level selection in the game, I created all NITROGLYCERIN molecule nodes statically, in MoleculeNodes.js. This causes molecules to render incorrectly due to a gradient bug in scenery. See https://github.com/phetsims/scenery/issues/335 for details.

pixelzoom commented 9 years ago

I implemented what was described as options (2a) and (2b) above. In a nutshell, I'm hiding the cost of adding and removing ChallengeNodes from the scene. When going to the 'Next' challenge, the old challenge is made invisible, and the new (current) challenge is made visible - no nodes are created/delete, and the scene is not changed. All work is deferred to the animation loop. While the user is distract looking at the updated screen, two things happen in the animation loop: the previous challenge is removed from the scene, and the next challenge is preloaded (built, added to the scene, but made invisible). This works very nicely on iPad2 (and all other platforms), super snappy response. The code is quite a bit trickier and will be more costly to maintain.

As soon as phetsims/scenery#335 is resolved, I'll be able to publish a dev version for evaluation.

pixelzoom commented 9 years ago

@ycarpenterphet and @ariel-phet Here's a dev version demonstrating the performance when switching between Challenges in the Game: http://www.colorado.edu/physics/phet/dev/html/reactants-products-and-leftovers/1.0.0-dev.21/reactants-products-and-leftovers_en.html

Please let me know if this performance is acceptable.

pixelzoom commented 9 years ago

@ycarpenterphet if the performance of 1.0.0-dev.21 is acceptable, please close this issue. Otherwise reassign it to me with a description of what needs to be improved. Thanks!

ariel-phet commented 9 years ago

@ycarpenterphet and @pixelzoom -- I tested the sim out and it seemed to be running well on iPad2. Nice work Chris!

ycarpenterphet commented 9 years ago

YAY! I agree that it is running beautifully, with remarkably good performance on iPad2 - I don't have the iPad3 here to check how it's running there, but suspect that it will be acceptable, and that the QA team will be able to check for sure.

phetsims / reactants-products-and-leftovers

performance of switching between reactions #17