Memory leak testing - Githubissues

pixelzoom commented 1 year ago

For https://github.com/phetsims/calculus-grapher/issues/262.

We can do this at the same time as code review, but should complete it before QA testing begins (3/20).

We'll need to do this for both phet and phet-io brands.

pixelzoom commented 1 year ago

Results of brand=phet memory test ...

Device: MacBookPro16,1 (2019 Intel), macOS 13.21. Browser: Chrome 110.0.5481.177 Build: grunt dev --minify.mangle=false --brands=phet,phet-io URL: local copy of 1.0.0-dev.22, http://localhost:8080/calculus-grapher/build/phet/calculus-grapher_en_phet.html?fuzz

Test protocol: Snaphot 1 was taken immediately after starting the sim. Snaphots 2-11 were taken at 1-minute intervals. Snapshot 12 was taken 10 minutes after Snapshot 11.

The memory footprint is very large -- perhaps that's due to --minify.mangle=false. More importantly, the sim appears to have a decent memory leak. Heap size decreased very slightly (1MB) from Snapshot 7 to 8. But it otherwise continued to increase, starting at 64MB and finishing at 142MB. I have not investigated the cause.

pixelzoom commented 1 year ago

Results of brand=phet-io memory test ...

Device: MacBookPro16,1 (2019 Intel), macOS 13.21. Browser: Chrome 110.0.5481.177 Build: grunt dev --minify.mangle=false --brands=phet,phet-io URL: https://phet-dev.colorado.edu/html/calculus-grapher/1.0.0-dev.22/phet-io/calculus-grapher_all_phet-io.html?postMessageOnError&phetioStandalone&fuzz

Test protocol: Same as above.

I'm very confused. Unlike the brand=phet test, this test shows no sign of a memory leak. Heap size seems to stabilize around 82MB and goes down 3 times.

Maybe the brand=phet test was different because I ran the build from my local calculus-grapher/build/phet/ directory, instead of from phet-colorado.edu? That makes no sense. But I'm going to re-run brand=phet from phet.colorado.edu.

pixelzoom commented 1 year ago

I started to rerun the brand=phet test. After 1 minute, heapsize went from 62MB to 140MB. So I started over.

Here are the results...

Device: MacBookPro16,1 (2019 Intel), macOS 13.21. Browser: Chrome 110.0.5481.177 Build: grunt dev --minify.mangle=false --brands=phet,phet-io URL: https://phet-dev.colorado.edu/html/calculus-grapher/1.0.0-dev.22/phet/calculus-grapher_en_phet.html?fuzz

Test protocol: Same as above, plus:

Snapshot 13, 5 minutes after Snapshot 12
Snapshot 14, after pressing GC button and waiting 1 minute

This test was looking good - memory usage seemed to level out around 80MB, and went down 3 times. Then came snapshot 12, and memory spiked to 102MB and stayed there.

I suspect that there's a big leak lurking somewhere, in something that is not hit very often by fuzzing. And when it is hit, memory footprint gets huge and stays there. And the brand=phet-io test in https://github.com/phetsims/calculus-grapher/issues/265#issuecomment-1461078212 probably got lucky and didn't hit it when fuzzing. "It" could be something in the sim, or it could be something in common code. probably got lucky and didn't hit it when fuzzing.

pixelzoom commented 1 year ago

For the above snapshots, I did a Comparison for Snapshot 12 -- which is where the spike to 102MB occurred. Then I posted to Slack#developer:

I’m doing routine memory-leak testing on Calculus Grapher in https://github.com/phetsims/calculus-grapher/issues/265. I’m seeing a huge intermittent memory leak when fuzzing — like it needs to exercise some feature that’s not easy to get to via fuzzing. When it does happen, the heap size gets massive and stays there. The very first thing identified in a heap comparison is shown below. Does this look familiar to anyone?

pixelzoom commented 1 year ago

@veillette pointer out that the above tokens appear in PhET repo fenster. It looks like @jessegreenberg created most of the files in the fenster repo, whose purpose (according to README.md) is to test haptics using Cordova. Why is this being pulled into Calculus Grapher? It has no support for haptics.

pixelzoom commented 1 year ago

I tried to reproduce in unbuilt geometric-optics, but could not. Heap size started at 54MB, then stabilized around 80MB for 10 heap snapshots. So I built a local version of geometric-optics with grunt --minify.mangle=false. The heap size when the sim starts is consistently 124MB !! geometric-optics does include a11y features. Has something really large (fenster? Cordova?) been added that’s getting loaded. This sim had a memory test recently by me and QA, and it was not behaving like this.

I’m going to pause work on this issue until I hear from @jessegreenberg or someone from the a11y team.

samreid commented 1 year ago

I see those things defined in he-1.1.1.js, it’s a sherpa dependency: https://github.com/mathiasbynens/he

jessegreenberg commented 1 year ago

fenster is not a sim dependency so it cannot be related to this. I am not aware of any new a11y features or anything that would add so much to the heap size. I did a little bit of memory testing and saw some odd things:

Without fuzzing, just changing screens and pressing some buttons.

![image](https://user-images.githubusercontent.com/6396244/223915016-d46e0b68-f3d1-487b-863f-a1513d6f5b88.png)

With fuzzing, snapshot every 30 seconds

![image](https://user-images.githubusercontent.com/6396244/223916107-94e179db-aa64-4643-8992-e39577bb2c5e.png)

I think he is used by RichText, is anything creating and never disposing a lot of RichText? Also, does this mean that there are 102 LinePlots in the sim?

veillette commented 1 year ago

Thanks @jessegreenberg .

I'll note that although 102 line plots looks like a large number, it is in line with our expectations for the number of line plots. Every graph creates two instances line plot, there are 11 graphs in the sim, so that's 22. In addition, every manipulation icon and curve manipulation display also invoke the line plot twice, so that's (7+7+11+11)*2. So that's another 68. So that's 90 that are accounted for, and I may have missed a few, so 102 is not unrealistic.

There are very few elements that are created dynamically in the sim. One notable exception is the GraphTypeLabelNode. RichText is used to create the labelnodes. I believe that we dispose of our labels properly but that may warrant further investigation.

jessegreenberg commented 1 year ago

OK, thanks. Maybe LinePlot just generates a ton of garbage when it updates. I think(?) this readout shows us the heap every second or so without pressing "Take Snapshot".

Here are some values I am seeing:

On load: 185 MB
Press Collect Garbage button: 87.7 MB
Move the graph a bit on the "Derivative" screen: 203 MB
Press Collect Garbage button: 88.2 MB

and so on, this is pretty reproducible for me.

pixelzoom commented 1 year ago

Thanks @jessegreenberg. And sorry to drag you into this.

pixelzoom commented 1 year ago

While working on #273, it occurred to me that the Undo feature might be responsible for the sudden memory spikes. Each time a curve is changed, its state is save by pushing 1251 instances this object type onto the Undo stack:

type CurvePointState = {
  y: number;
  pointType: PointType;
}

The maximum Undo stack length is 20, so there are 20 x 1251 = 25020 CurvePointState objects that can be very suddenly created by a fuzzer that is rapidly "clicking" in the graphs.

We could test this hypothesis by commenting out the body of CurvePoint save, so that it's a no-op. Then test to see if the memory spikes are still occurring.

veillette commented 1 year ago

hhmm interesting, that should be an easy check: FYI: CalculusGrapherConstants is hoisting the constant

// maximum of undo actions (See https://github.com/phetsims/calculus-grapher/issues/64)
  MAX_UNDO: 20,

veillette commented 1 year ago

I'll test your hypothesis. I will not fuzz such that I have maximum control. I'll test the predict curve, which is also saved, but does not generate additional noise by not having derivative curves, etc, attached to it. We can add 20 curves, take a snapshot, then undo them, as see if we come back to our starting memory footprint.

veillette commented 1 year ago

Tested with http://localhost:63342/phetsims/calculus-grapher/calculus-grapher_en.html?brand=phet&ea&debugger&screens=1

I start only with screen 1, so therefore I start with a smaller footprint. I turned on the predict mode.

Snapshot 1 is the baseline snapshot 2: I applied a curve 10 times snapshot 3: I applied a curve 10 times, so 20 total snapshot 4: I applied a curve 10 times, so 30 total. Notice that the memory should not increase, and it doesnt. snapshot 5: I undo the curve 10 times. Memory goes down, as anticipated snapshot 6: I undo the curve 10 times. Memory goes down, as anticipated. The stack is now empty.

I should also note that right before I collected a snapshot, I did a manual garbage collection to get rid of any noise.

The bottom line, is that the stack has minimal impact on the memory, of the order of 1 MB. Also, in principle snapshot 6 should be the same as 1, but there is a difference of 0.3MB.

Comparing snapshot 5 to 4, where the memory goes, down. I removed 1251*10=12510 objects as expected

pixelzoom commented 1 year ago

Great, thanks for testing the Undo feature.

pixelzoom commented 1 year ago

I'm seeing the same thing that @jessegreenberg observed in https://github.com/phetsims/calculus-grapher/issues/265#issuecomment-1462204007. Memory footprint varies wildly as the sim is exercised. And I cannot identify anything that's clearly a memory leak.

Can we safely conclude that this is to be expected with so many bamboo plots, with large dataSets? And alert QA to this fact before testing? @veillette what's your opinion?

pixelzoom commented 1 year ago

Discussed with @veillette. The redundant work in #278 was certainly causing a lot of unnecessary allocation. So we should recheck to see if the memory spike has become less large. @veillette will have a look.

veillette commented 1 year ago

Memory test on Master

Browser: Chrome 110.0.5481.180 ( URL: http://localhost:63342/phetsims/calculus-grapher/calculus-grapher_en.html?brand=phet&ea&debugger&fuzz

Test protocol: Same as above, plus:

All Snapshots are taken at 2 minutes intervals There is a 10 minutes interval between snapshot 9 and 10.

The memory increased for all snapshots.

Below is the difference between snapshot 9 and 2, ordered by Delta (column 3).

(number) | 844 023 | 719 638 | +124 385 | 10 128 276 | 8 635 656 | +1 492 620 | -- | -- | -- | -- | -- | -- | -- | -- Object | 215 586 | 140 666 | +74 920 | 4 658 240 | 3 113 816 | +1 544 424 | Array | 380 701 | 317 360 | +63 341 | 6 091 264 | 5 077 808 | +1 013 456 | (array) | 314 135 | 260 772 | +53 363 | 24 525 208 | 20 135 888 | +4 389 320 | Set | 217 188 | 179 943 | +37 245 | 3 475 056 | 2 879 136 | +595 920 | (closure) | 171 432 | 150 712 | +20 720 | 4 356 716 | 3 845 592 | +511 124 | TinyEmitter | 140 751 | 120 142 | +20 609 | 3 378 000 | 2 883 384 | +494 616 | Array | 380 464 | 368 856 | +11 608 | 6 087 472 | 5 901 744 | +185 728 | Bounds2 | 80 648 | 70 668 | +9 980 | 4 193 696 | 3 674 736 | +518 960 | (number) | 841 455 | 831 849 | +9 606 | 10 097 460 | 9 982 188 | +115 272 | (array) | 313 387 | 304 454 | +8 933 | 24 471 820 | 23 665 152 | +806 668 | Set | 215 815 | 209 618 | +6 197 | 3 453 076 | 3 353 924 | +99 152 | TinyForwardingProperty | 22 524 | 16 358 | +6 166 | 900 960 | 654 320 | +246 640 | TinyProperty | 36 210 | 30 369 | +5 841 | 1 158 600 | 971 688 | +186 912 | Object | 215 286 | 209 671 | +5 615 | 4 652 224 | 4 531 968 | +120 256 | system / Context | 44 543 | 39 698 | +4 845 | 950 864 | 853 176 | +97 688 | Matrix3 | 19 905 | 15 214 | +4 691 | 477 720 | 365 136 | +112 584 | TinyStaticProperty | 17 220 | 12 612 | +4 608 | 619 920 | 454 032 | +165 888 | (closure) | 171 001 | 166 538 | +4 463 | 4 345 748 | 4 237 020 | +108 728 | (compiled code) | 97 870 | 94 329 | +3 541 | 12 610 156 | 10 902 600 | +1 707 556 | TinyEmitter | 139 347 | 135 895 | +3 452 | 3 344 316 | 3 261 468 | +82 848 | Vector2 | 134 033 | 131 490 | +2 543 | 2 680 652 | 2 629 792 | +50 860 | PDOMDisplaysInfo | 4 305 | 3 153 | +1 152 | 86 100 | 63 060 | +23 040 | Picker | 4 305 | 3 153 | +1 152 | 292 740 | 214 404 | +78 3

veillette commented 1 year ago

I don't know what to think. Is it acceptable? It is a rather significant heap increase.

pixelzoom commented 1 year ago

I'm seeing the same thing - memory just keeps increasing. So we probably do have a memory leak. And with Heap Comparision, I feel like I'm looking for a needle in haystack. I think we need to request some outside help here.

pixelzoom commented 1 year ago

We asked @samreid if he'd be willing to have a look, and he accepted. Thanks!

samreid commented 1 year ago

Notes to self for testing:

Building with: grunt --brands=phet --minify.mangle=false 5 minute warm up fuzz Use phet.chipper.queryParameters.fuzz = !phet.chipper.queryParameters.fuzz; and reset all between snapshots

samreid commented 1 year ago

It does look like a leak, can you please check usages of the CalculusGrapherSymbols including but not limited to fStringProperty to see their usage sites, and make sure to dispose usages and/or remove listeners? @veillette or @pixelzoom want to schedule synchronous collaboration?

pixelzoom commented 1 year ago

CalculusGrapherSymbols has 4 fields that are derived from translated StringProperties:

  dStringProperty: MathSymbolFont.createDerivedProperty( CalculusGrapherStrings.symbol.dStringProperty ), // d
  xStringProperty: MathSymbolFont.createDerivedProperty( CalculusGrapherStrings.symbol.xStringProperty ), // x
  fStringProperty: MathSymbolFont.createDerivedProperty( CalculusGrapherStrings.symbol.fStringProperty ), // f
  tStringProperty: MathSymbolFont.createDerivedProperty( CalculusGrapherStrings.symbol.tStringProperty ), // t

They are used in these 3 places:

GraphTypeLabelNode - the labels that appear in the upper-left corner of graphs, on radio buttons, on the "Notation" and "Variable" preferences controls, and on screen icons
ReferenceLineNode - the view for the thing that appears when you click "Reference Line" checkbox
VariableControl (the "Variable" control in the Preferences dialog)

So perhaps this leak is related to disposal of the Preferences dialog. My understanding is that the Preferences dialog is instantiated on demand, and then reused. And in PhET-iO sims, it is also instantiated to create the archetype. So dispose methods in the various subcomponents of CalculusGrapherSimulationPreferencesNode need some work, because Text/RichText suubcomponents are currently not disposed (tracking in https://github.com/phetsims/calculus-grapher/issues/289).

But how could this be accounting for such a large memory leak? Or am I mistaken about the Preference dialog being reused?

pixelzoom commented 1 year ago

My understanding is that the Preferences dialog is instantiated on demand, and then reused. And in PhET-iO sims, it is also instantiated to create the archetype.

I verified that this is indeed what is happening in phet and phet-io brands.

samreid commented 1 year ago

It seems like the fStringProperty.tinyProperty has 979 listeners. Is that to be expected?

pixelzoom commented 1 year ago

No, it makes no sense that fStringProperty has 979 listeners.

fStringProperty is a dependency of 4 DerivedProperties in GraphTypeLabelNode, which is in turn used as a label in these places:

4 f(x) radio button buttons, one for each screen (turn on 'Predict' preference to see this radio button)
2 graphs labels in Derivative screen
2 graph labels in Integral screen
3 graph labels and 2 radio button labels in the Advanced screen
4 graph labels and 2 radio button labels in the Lab screen
2 radio buttons for the Notation preference control

So there should be on the order of 17 listeners. I have no explanation for why there are 979 listeners.

pixelzoom commented 1 year ago

Over in https://github.com/phetsims/calculus-grapher/issues/289, I address some minor leakage that was coming from preferencesDialogCapsule.archetype.preferencesPanels.simulationPreferencesPanel.content.

pixelzoom commented 1 year ago

So there should be on the order of 17 listeners. I have no explanation for why there are 979 listeners.

My count is a little off. I added this to the beginning of GraphTypeLabelNode constructor:

let count = 0;
...
    console.log( `GraphTypeLabelNode constructor ${++count}` );

Running brand=phet with ?fuzz and waiting for the Preference dialog to be opened, there are 25 instances created. For brand=phet-io, there are 27 instances (an additional 2 for the Preferences dialog archetype.)

pixelzoom commented 1 year ago

I added this to CalculusGrapherScreenView step:

    console.log( `CalculusGrapherStrings.symbol.fStringProperty ${CalculusGrapherStrings.symbol.fStringProperty.getListenerCount()}` );
    console.log( `CalculusGrapherSymbols.fStringProperty ${CalculusGrapherSymbols.fStringProperty.getListenerCount()}` );

After several minutes of fuzzing, the maximum listener counts that I see are:

CalculusGrapherStrings.symbol.fStringProperty 4 CalculusGrapherSymbols.fStringProperty 28

Nothing even approaching 979. So I'm not sure what https://github.com/phetsims/calculus-grapher/issues/265#issuecomment-1468688654 is telling us.

samreid commented 1 year ago

I ran the following test:

Local build via grunt --brands=phet --minify.mangle=false --lint=false
Launch built version
Fuzz for about 30 minutes. Pause fuzzing via phet.chipper.queryParameters.fuzz = !phet.chipper.queryParameters.fuzz; Reset each screen, then go to screen 1
Memory snapshot
Fuzz for 5 minutes. Pause fuzzing via phet.chipper.queryParameters.fuzz = !phet.chipper.queryParameters.fuzz;. Reset each screen then go to screen 1.

Between these snapshots, I see about the same memory:

And the memory deltas look like noise:

So I'm wondering if this was corrected in https://github.com/phetsims/calculus-grapher/issues/289. Self-unassigning.

pixelzoom commented 1 year ago

... So I'm wondering if this was corrected in https://github.com/phetsims/calculus-grapher/issues/289.

Possible, I guess. But I'm surprised that a leak in the Preferences dialog would result in such a dramatic memory spike.

Thanks for assisting @samreid, we'll take it from here.

pixelzoom commented 1 year ago

Results of memory test, after fixing leaks in https://github.com/phetsims/calculus-grapher/issues/289.

Device: MacBookPro16,1 (2019 Intel), macOS 13.21. Browser: Chrome 111.0.5563.64 Build: grunt dev --minify.mangle=false --brands=phet,phet-io URL: https://phet-dev.colorado.edu/html/calculus-grapher/1.0.0-dev.24/phet-io/calculus-grapher_all_phet-io.html?fuzz

Protocol:

Snaphot 1 was taken immediately after starting the sim.
Snaphots 2-11 were taken at 1-minute intervals.
Snapshot 12 was taken 5 minutes after Snapshot 11.

Results for brand=phet: Memory stabilizes around 81MB, then goes up and down. My conclusion is that our memory leak is resolved.

pixelzoom commented 1 year ago

I'll do a test for brand=phet-io with 1.0.0-dev.24 later today. If that looks OK, we can close this issue.

pixelzoom commented 1 year ago

Below are the results for brand=phet-io, with same details are https://github.com/phetsims/calculus-grapher/issues/265#issuecomment-1472131813.

Snapshot 2 is an outlier; I can rationalize this as a snapshot where GC had not yet occurred. Otherwise, memory again stabilizes ~81MB, then goes up and down. So our memory leak is also resolved for PhET-iO.

pixelzoom commented 1 year ago

Conclusion: https://github.com/phetsims/calculus-grapher/issues/289 resolved the memory leak.

This issue is ready to close.

phetsims / calculus-grapher

Memory leak testing #265