observablehq / stdlib

The Observable standard library.
https://observablehq.com/@observablehq/standard-library
ISC License
967 stars 83 forks source link

Add Generators.{disposable,worker}. #28

Closed mbostock closed 6 years ago

mbostock commented 6 years ago

These methods make it easier to clean up resources that need explicit disposal, such as workers and GPU-backed tensors. For example, a dedicated web worker:

worker = Generators.worker(`
onmessage = function ({data}) {
  postMessage({echo: data});
};
`)

A tensor:

x = Generators.disposable(tf.tensor2d([[0.0, 2.0], [4.0, 6.0]]), x => x.dispose())
tmcw commented 6 years ago

Not entirely in the scope of this - most likely something for userspace but I'd like to get handle on whether it'd be possible to do this in userspace:

What would an equivalent to util.promisify look like for 'disposable objects'? Like would this make it possible to do something like:

const {tensor2d} = disposable(tf.tensor2d);

let a = tensor2d([[1, 2]]); // automatically disposable fancy version of tensor2d
mbostock commented 6 years ago

I originally named this Generators.finalize, but I wasn’t sure whether people would get the analogy with Java’s finalize. Also, Java’s finalize is more difficult to understand: it is called at some arbitrary time (or not at all!) when the garbage collector runs. Observable, in contrast, explicit terminates a generator when the cell is invalidated.

I agree that Generators.dispose is perhaps too terse, and it sounds like you are disposing something now rather than on invalidation. But Generators.finalize has that same problem.

I somewhat like Generators.once? I also considered Generators.singleton. I worry that it doesn’t sufficiently convey that the purpose of the method is to allow disposal, however.

So, maybe Generators.disposable. Or maybe we should harmonize the name with “invalidation” somehow. I’ll think a bit more.

jashkenas commented 6 years ago

Makes sense. "disposable" is fairly nice, in that you can have the API call / docs be:

Generators.disposable(value, dispose)

... where "dispose" is the dispose function implementation.

There's also a nice parallel to C#: https://msdn.microsoft.com/en-us/library/system.idisposable(v=vs.110).aspx ... where "IDisposable" is the abstract interface, and "dispose" is the method.

nsthorat commented 6 years ago

In general, I love the approach, this is going to really help!

+1 I think "dispose" may not be the right name (it seems like it's going to dispose immediately). "disposable" seems reasonable, and is a pretty common pattern. See closure as well as the C# link above. I think "finalize" would be reasonable too - in a sense Observable is destroying the previous value of the cell and giving you a callback before it happens.

So a typical cell using tfjs might look like this (correct me if I'm missing something):

{
  const y = tf.tidy(() => {
    const x = tf.scalar(2);
    return x.square();
  });
  return Generators.disposable(y, () => y.dispose());
}

One thing we might do as our own TF.js + observable stdlib notebook is something like this:

{
  return tflib.generateDisposable(() => {
    const x = tf.scalar(2);
    return x.square();
  });
}

Where tflib is imported from some TF.js stdlib notebook.

Obviously I need to think about the naming of "generateDisposable" in this case, but you get the drift (it tidy()'s and also disposes that last value that was generated).

You guys rock!

nsthorat commented 6 years ago

I just saw @tmcw comment with const {tensor2d} = disposable(tf.tensor2d);.

The only issue with this is that usually the tensor you're usually returning is the output of some mathematical operation, which means we'd have to hijack all of our operations to make them return generators (see the snippet I just linked above that returns x.square()).

I think generally speaking, the last code snippet I posted above is what a common cell will look like.

mbostock commented 6 years ago

Thanks for the feedback, @nsthorat! I’ve renamed the method to Generators.disposable.

You can shorten your cell definition slightly since the dispose function is passed the return value:

y = Generators.disposable(tf.tidy(() => {
  const x = tf.scalar(2);
  return x.square();
}), y => y.dispose())

And yeah, TensorFlow.js could provide its own helper around Generators.disposable and tf.tidy:

tf.disposable = f => Generators.disposable(tf.tidy(f), t => t.dispose());

(This assumes that you always return a Tensor from your tidy function, which is probably a bad assumption, but you get the idea.) Then you could say:

y = tf.disposable(() => {
  const x = tf.scalar(2);
  return x.square();
})

Which would be pretty great!

Also, if you wanted, you could extend this pattern to support async by passing in a tf context object to your tidy function. This context would expose the same constructors as the static tf, but anything allocated by that context would be automatically disposed when the promise resolves or rejects. That could look like this:

y = tf.disposable(async tf => {
  const x = tf.scalar(2);
  await somePromise;
  return x.square();
})

Might be hard to remember you need the tf argument to their tidy function, though—if you forget it, then tf is the static TensorFlow namespace, and your code continues to work but leaks tensors. 😁