observablehq / feedback

Customer submitted bugs and feature requests
42 stars 3 forks source link

Javascript Class cell severe performance degradation #591

Closed dleeftink closed 10 months ago

dleeftink commented 10 months ago

Describe the bug When defining Javascript classes that use (Typed) backing arrays, rerunning the cell that defines the Class causes dependent cell performance to decrease. It seems the memory is not properly freed between reruns of the Class cell. Constructed classes may need a proper way of 'destroying' the instance that I have yet to figure out (e.g. I've tried invalidation.then() to no avail).

To Reproduce Go to this notebook for a demonstration.

Expected behavior Dependent cells to not be affected performance-wise by rerunning the Class cell.

Desktop (please complete the following information):

mootari commented 10 months ago

I cannot reproduce that. Be aware that classes in JS are merely an abstraction, there's nothing special about the objects once they've been instantiated.

The sandbox context is shared by all of your opened notebooks, so I suspect that what you're seeing is memory used by a different tab or notebook. Open Window > Task Manager from Chrome's menu to see which tabs share a context.

mootari commented 10 months ago

... also, TypedArrays are known to be slow to create since they immediately allocate memory. If you need to reuse large typed arrays frequently, consider persisting them throughout an invalidation.

dleeftink commented 10 months ago

I am reusing them; constructing a Class once and filling the TypedArray with zeroes to reset them or setting Array.length = 0 when using regular arrays.

The issue I have is that when I am frequently editing a Class definition in a cell, its dependent cells need to be manually rerun to 'reset' them to instantiate a fresh class. From time to time however, I can also observe performance degradation when using imported classes in external notebooks, difficult to reproduce. Also, note that in the example I am not using typed arrays.

It seem the behaviour disappears when instantiating a class within a module; I've included an example in the original notebook. Please see the linked video for a demonstration in a single window without any other running Observable tabs.

mootari commented 10 months ago

Also, note that in the example I am not using typed arrays.

And please note that I mentioned that I cannot reproduce what you're seeing. :)

The garbage collector kicks in every so often and reduces memory usage again, which is likely what you notice as performance degradations.

Your "NonLeaky" example is actually leaky. Even if you were to call revokeObjectURL (which you don't), the modules are still kept in memory indefinitely. Hit that start button a few times and you'll see the iframe context crash due to memory exhaustion.

mootari commented 10 months ago

I've put together a small helper that lets you monitor memory usage more easily:

import {plotMemoryUsage} from "50cb8ca0dfe9b933"
plotMemoryUsage({invalidation, interval: 100, updateInterval: 100, length: 1000, width})

The first bumps are from your "Leaky" example, the last ones from your "NonLeaky" one:

image
dleeftink commented 10 months ago

Thank you for the helper, makes it easier to see what is happening. The 'NonLeaky' one is definitely not the way to go, but as shown in the video has consistent performance compared to the 'Leaky' one before the memory builds up to the level shown in your plot.

And re: the garbage collector; on my end it doesn't seem to 'kick in' until after I rerun the dependent cell manually (e.g. rerunning the cell named instance).

It all seems a little strange, as it doesn't happen in the console with the same code, or when using a class assigned to the notebook's globalThis.

This issue an also be replicated on my phone (Android 12 + Chrome).

mootari commented 10 months ago

Afaik the GC will try to pick moments of low activity to run, so you might have to wait for a while to see a decrease (note the stretches without any change in my example). It also can't clean up any objects to which there are still references - which includes e.g. the Inspector output.

As long as there are no actual leaks (objects which are still referenced and can't be cleaned up) an increased memory usage shouldn't be a concern. The GC will kick in more frequently as memory pressure increases.

mootari commented 10 months ago

And if you're testing in private/incognito windows then be aware that multiple private windows are not isolated from each other (at least not in Chrome).

dleeftink commented 10 months ago

Yes, the video was a single window, fresh Windows boot.

It also can't clean up any objects to which there are still references - which includes e.g. the Inspector output.

This is interesting, I didn't know the Inspector held object references, but should've been obvious seeing how it displays the results.

Still, Class cells with large backing arrays do not exhibit the 'leaky' behaviour when assigned and instantiated from the globalThis, at least on my end. Similarly, when including the class definition and its dependent code in a single cell, the issue also disappears.

mootari commented 10 months ago

when assigned and instantiated from the globalThis

I'm not following, can you give an example? When you assign to globalThis you create a permanent reference on the window object, which means whatever you assigned can't get garbage collected.

I'm also increasingly unclear about the problem that is being discussed in this issue:

dleeftink commented 10 months ago

Honestly, difficult to say. Stepwise, the issue I'm encountering is as follows:

  1. Create a Class cell with a (potentially) large backing array, assign to this in the constructor. Add a method to manipulate the array (e.g. insert)
  2. Instantiate the class in a separate cell, and run some loops in this cell to insert values, potentially clearing the array using .fill(0) or .length = 1. The latter is left out of the notebook for brevity.
  3. Go back to the Class cell and make some edits, press enter to rerun.
  4. Observe the dependent cell execution time increase until dependent cell is manually rerun
    • Rerun Class cell multiple times if effect goes unnoticed

The issue here, is that the dependent cell somehow maintains the previous class instantiation/array reference when editing and rerunning the Class cell. This possibly prevents the GC from picking up the now unused array reference.

This is maybe demonstrated better by including invalidation.then(()=>console.log(this)) in the dependent cell:

  1. When rerunning the dependent cell instance multiple times, console.log(this) remains undefined.
  2. When rerunning the Leaky cell, console.log(this) is undefined the first time the instance cell invalidates, but logs the cell's previous return state the second time.

When this holds a large array in the instance cell, performance issues seem to arise when rerunning the Leaky cell multiple times.

When assigning to Leaky to the globalThis instead of naming the cell (e.g. globalThis.Leaky = class { ... } instead of Leaky = class { ... }), dataflow isn't triggered and the cell maintains its previous state. This isn't ideal, as we again have to manually rerun the dependent cell to access the updated class from (new globalThis.Leaky()). But at least, the execution time isn't increased on the first go.

I can make a follow up screen recording if things remain unclear.

mootari commented 10 months ago

Observe the dependent cell execution time increase until dependent cell is manually rerun

Do you mean the execution time is increased compared to the initial run, or that it increases gradually until you reevaluate the dependent cell? In your test notebook I can observe that clicking start logs a fairly constant time of ~200ms, no matter how often I click it.

An invalidated cell's this context holds the previous cell value (which is why you don't see it when rerunning the cell directly). You can see it being passed here. As far as I'm aware we don't keep a reference to the old value after the cell has been reevaluated.

I recommend that you remove the entire class construct from your tests and instead test with simple objects.

dleeftink commented 10 months ago

Do you mean the execution time is increased compared to the initial run, or that it increases gradually until you reevaluate the dependent cell?

Increased compared to initial run, which runs at 80ms for the current parameters, after which it gradually increases till 300-400ms and tops out there when rerunning the Class cell multiple times.

The 80ms benchmark is similar when running in the console, so that would be the target performance even after rerunning the Class cell.

mootari commented 10 months ago

Unfortunately I don't think there's anything that can be done on our end. This is just how JavaScript engines work.

You can try to pass the old array as init to your new instance:

class MyClass {
  constructor({size = 1024, init}) {
    this.data = init?.size === size
      ? (init.fill(0), init)
      : new Uint8Array(size);
  }
}
instance = new MyClass({init: this?.data})

or even just pass the buffer around. Also, please feel free to continue this conversation in https://talk.observablehq.com/t/how-to-dispose-class-cells-containing-array-constructors/8198 where it will likely be more discoverable and might help others.

Aside: I saw Array(length) in your example notebook. Try to avoid that as it will create a holey array which causes all sorts of deoptimizations.

dleeftink commented 10 months ago

For posterity, here's a screen recording demonstrating this more clearly.

mootari commented 10 months ago

I recommend to add the memory plot to your notebook. You'll likely see that the time increase coincides with a drop in memory usage as the garbage collector goes to work.

dleeftink commented 10 months ago

I recommend to add the memory plot to your notebook. You'll likely see that the time increase coincides with a drop in memory usage as the garbage collector goes to work.

Not sure how this compares to your memory profile, but it doesn't seem to go down on my end after clicking the start button, even after a while.

memory_usage

I will try out your suggestion and report back on the forum.

dleeftink commented 10 months ago

I've found a suitable workaround, documented here.

The gist of it is that the dependent cell reuses the class instance on subsequent runs, by including a reuse() on the Class and make it return this. Interestingly, performance now actually increases on subsequent runs, as the backing array is simply reused even when instantiating a 'holey' array.