Zooming into a Var should not require an Owner?

raquo commented 9 months ago

Background

Currently, when you .zoom(...) into a Var to create a derived Var, Airstream internally maps over the parent var's signal, and adds an observer to it to make sure that it's strictly evaluated. To clean up the resulting subscription when this derived var is no longer needed, Airstream currently requires an Owner to call the zoom method.

Requiring an owner here is annoying, as explained in #112, https://github.com/raquo/Laminar/issues/130 and https://github.com/raquo/Laminar/issues/148

Owner is not required

@kitlangton suggested on Discord that we don't actually need an Owner in this scenario, and it turns out that he was right!

Suppose we have:

def zoomInFn(a: A): B = ???

def zoomOutFn(a: A, b: B): B = ???

val parentVar: Var[A] = ???

val derivedVar: Var[B] = parentVar.zoom(zoomInFn)(zoomOutFn)

We want derivedVar to be strictly evaluated, i.e. we should be able to call derivedVar.now() any time and get the value of zoomInFn(parentVar.now()). On its own, such "pulling" of a signal's current value does not require an Owner. The Owner is needed to "push" new values from parentVar to derivedVar whenever parentVar updates[1]. That's how zooming works today – derivedVar is auto-updated whenever parentVar is updated, and calling derivedVar.now() does not trigger any computation, it simply retrieves the already-computed value.

In contrast, without such auto-updates, updating parentVar would not evaluate zoomInFn(parentVar.now()) immediately – it would only be evaluated if and when we call derivedVar.now(). This way we don't need an Owner, but we need to deal with other issues.

Purity will be required

As you've likely noticed, using the Owner-less "pull" approach requires zoomInFn to be a pure function, since the number of times it's called is up to the user, as is the timing of those calls, e.g. for some updates of parentVar, zoomInFn might not be called at all, if the user does not call derivedVar.now() before the next update to parentVar.

Normally, Airstream's public API does not require user-provided callbacks to be pure, and this is by design. I think the batch version of the Var update method – Var.update(var1 -> updateFn1, var2 -> updateFn2) – is currently the only method that advises the user to provide pure callbacks – in that case it's because some of these callbacks might fail, and the whole batch update is aborted, so you wouldn't want any user-specified side effects to still happen in that case.

In practice, zoomInFn is typically used to select a val member containing an immutable value from an immutable class, so it's very likely to already be pure. So, maybe requiring purity here is a good tradeoff. Not needing an Owner is a huge advantage, after all. Or, perhaps we should have a separate method to create such "lazy" derived vars, e.g. pureZoom(...) instead of zoom(...). Not sure yet – feedback welcome.

Nesting and types

Currently we can safely assume that the type Var is strictly evaluated, so if we zoom into parentVar, zoomInFn will be evaluated whenever parentVar is updated. However, if both zoom and pureZoom return subtypes of Var, we need to take care to maintain this assumption, i.e. if we call derivedVar.zoom, derivedVar must become strictly evaluated even if it was "lazy" before (created with pureZoom). I think this will happen naturally, as derivedVar.zoom will start derivedVar.signal using the provided owner, but this is one more area to watch.

Similar concern applies to StrictSignal's proposed pureMap method (see below), make sure to also think that through.

Sync with derived var's signal

Any Var's .now() value must always be in sync with this Var's signal.now(). In practice, this should be easy to achieve – simply put the pull-based implementation of .now() in the Var's signal, rather than in the Var itself. This would also let us query the parent signal's _lastUpdatedId to check whether parentVar has updated since the last time we called derivedVar.now(), eliminating redundant evaluations of zoomInFn (but not solving the problem of potentially missing evaluations if .now() is not called, so purity is still required).

As Kit suggested, the implementation of .now() would basically fall back to current behavior if derivedVar.signal is started (i.e. is being observed by other listeners), the pulling logic would only be enabled if the signal is stopped.

Signals already re-sync their value with the parent signal when they're started after being stopped, so I think the sync between derivedVar.now() and current value seen by derivedVar.signal's observers would be already built-in, although extra care needs to be taken to review the starting process of derivedVar.signal – I need to figure out the right time in the process to switch from standard logic to pull-based logic.

Implications for other parts of the API (StrictSignal)

We restrict the Airstream public API to only expose memory-safe operations, to prevent users from accidentally creating memory leaks, or reading stale values. We can't rule out such things in principle (e.g. users can always pass the wrong owner), but we can make them exceptionally unlikely (e.g. by not requiring users to provide owners explicitly, having Laminar take care of it behind the scenes).

As mentioned before, one of the restrictions that I put on Airstream API is that it shouldn't require user-provided callbacks to be pure – but I put such restrictions for pragmatic reasons – to make Airstream more forgiving – not ideological reasons, and so such restrictions are subject to revision if the right tradeoff opportunity presents itself.

In this case, remember that Var is essentially a bundle of StrictSignal (the read channel including .now()) and Observer (the write channel). So, if we can zoom into a Var without an Owner and get another Var, we should be able to map a StrictSignal without an Owner to get another StrictSignal, no? Currently map-ing a StrictSignal does not require an Owner, but it results in a regular Signal that is lazy and does not have a public .now() (see https://github.com/raquo/Laminar/issues/130). We would then need to call .observe(owner) on that mapped signal to convert it back to a StrictSignal.

So, we could override def map on StrictSignal to return another StrictSignal without an Owner, with similar pull-based semantics as we discussed for lazy derived vars above. That way you could do e.g.:

val strictSignal: StrictSignal[A] = parentVar.signal
val mappedSignal: StrictSignal[B] = strictSignal.map(makeBfromA)

val nowB = mappedSignal.now() // no need for owners or observers

As you see, it's very much like a derived var, but without the write channel.

However, unlike derived var's zoomInFn, the project callback of signal.map has more varied use cases, and requiring it to be pure gives me more of a pause. On the other hand, the regular signal.map is not affected by it, it's only the signals created via strictsignal.map that are affected, and only when the resulting signal has no observers – a case when currently, the project callback would not evaluate at all (and calling .now() on it is not possible).

It is also not clear which of StrictSignal's methods would be "upgraded" to such purity-requiring versions that return StrictSignal instead of Signal. map – yes, but any others? recover, debugWith, it's not obvious which other ones would support this.

Perhaps we could ease into this new pattern by adding a new pureMap(pureProject) method on StrictSignal, and if we're doing that, we might as well implement the new lazy derived var functionality under pureZoom, keeping the old zoom alive.

Kit's implementation

This assumes that zoomIn is pure, and should give you a good idea of what using this kind of API would be like. You can already use this in your code, it uses extension methods and does not require any changes to Airstream internals.

class ZoomedVar[I, O](
    parent: Var[I],
    zoomIn: I => O,
    zoomOut: (I, O) => I,
) extends Var[O]:
  def setCurrentValue(
    value: Try[O], 
    transaction: Transaction
  ): Unit =
    val tryA = value.map(zoomOut(parent.now(), _))
    parent.setCurrentValue(tryA, transaction)

  val signal = new ZoomedVarSignal[I, O](parent, zoomIn)

  def getCurrentValue = signal.tryNow()

  def underlyingVar = parent.underlyingVar

object ZoomedVar:
  extension [I](parent: Var[I])
    def zoomed[O](zoomIn: I => O)(zoomOut: (I, O) => I): ZoomedVar[I, O] =
      new ZoomedVar(parent, zoomIn, zoomOut)

class ZoomedVarSignal[I, O](
    parent: Var[I],
    zoomIn: I => O,
) extends MapSignal[I, O](parent.signal, zoomIn, None)
    with StrictSignal[O]:
  override def tryNow(): Try[O] =
    if isStarted then super.tryNow() else parent.tryNow().map(zoomIn)

(I took the liberty of replacing override def now with override def tryNow to make tryNow() work too)

Summary

This approach looks very promising. At the very least, it works as "lazy derived vars" mentioned in #112, while retaining the full API of real vars (except for requiring purity). I will need to think some more how this finding affects all the other linked issues, perhaps more opportunities for improvements will be revealed.

Footnotes:

[1] because for that to happen, parentVar needs to keep a reference to derivedVar, and that creates circular references which long story short, garbage collection can easily miss in realistic use cases, so we need an Owner to break those circular references when they're no longer needed

raquo commented 9 months ago

Sorry, I've accidentally pressed Cmd+Enter and posted the half-written issue too early. I've now updated it with the full text.

kitlangton commented 9 months ago

🥳 Very exciting!

I do have some general thoughts/questions about the purity requirements.

Do users currently depend on zoomIn being called precisely as many times as the parent variable is updated? I can't easily imagine a case, even if my Var contained mutable state, in which I'd be relying on that invariant—particularly if zoomIn might be called fewer times than the parent is updated, rather than more.

Perhaps the user could perform some logging/tracking in here? Yet, if one wished to execute some logic each time the parent Var changed, wouldn't using the Var's Signal be the more natural approach?

And because the zoomIn function may be called even fewer times than before, is this so much a purity issue? I think it could be argued that it's an implementation detail.

Of course, I completely understand that it's impossible to guess how users might be relying on some particular behavior. In fact, it's nearly certain that for any library of modest user-base, someone somewhere is pinning the hopes of their entire application on each incidental bit of functionality, conceived of and maintained by the author or not.

And then, I'm sure the maintenance of such invariants is useful to you, yourself, as you develop a library with such intrinsic complication. You're clearly well versed in the multifarious gotchas of this domain, as well as the guarantees you've thus far painstakingly preserved.

So, I guess what I'm devil's advocating for here is the possibility of the wholesale replacement of zoom with this new Ownerless variant, and casting the current behavior to the wind.

The benefits would be:

I'd guess that for 99.9% of cases, the Ownerless implementation will be more pleasant to use. In fact, I'd imagine that some users are currently using zoom with unsafeWindowOwner or some other custom leaky Owner. I assume this because I know I've done it myself, before I fully understood the consequences, and really wanted to zoom into my Var.
Less code > more code. Both for users and maintainers. Users won't have to reckon with the decision between these two overrides, and maintainers won't have to reckon with question of which of the many subtypes of Var they might be dealing with.
I guess those are the only two points, but that wouldn't be aesthetically pleasing. So I'll reiterate that I'm a big fan of smaller, more opinionated APIs, especially when the alternatives would be so rarely (if my guessing is at all accurate) useful.

And the downside would be the relaxing of some previous behavior. But of course, the recommended alternative would be to simply observe the parent var's signal if you wanted to be notified exactly-once (and no less) per emission.

I do see the argument for StrictSignal not behaving this way. Perhaps mapStrict, with a docstring of the caveats, would make sense? Though perhaps it's still worth thinking through the worst negative consequences of just doing the Ownerless version.

My lesser, backup (imp's advocacy) argument would be that perhaps the current behavior could get demoted to zoomImpure, and the Ownerless variant could get top-billing. Only because I think it's more often what people would want (and to avoid folks jury-rigging zoom with an unsafeWindowOwner).

A final, orthogonal thought on API-design is that I do enjoy a post-fix pattern for method alternatives, if only because it more naturally works with autocomplete. The user would type zoom and see all of the variants listed out at the end zoomX, zoomY, zoomZ. Though, it's more important to be consistent with the library's existing patterns.

Okay! All done! I'm super happy either way, just thought I'd put those arguments out there in case they're helpful.

raquo commented 9 months ago

Those are all good points, thanks for elaborating! At the moment I can't give a specific, compelling, example, for what impure things the users could legitimately be doing in the zoomInFn callback. If nobody will be able to come up with that by the time I get to implementing it, I say perhaps it's a good indication that the problem is not worth worrying about in practice, and the lazy owner-less behaviour is better off being the default.

I may be wrong in calling the requirement on zoomInFn "purity" – FP nomenclature isn't my strong suit. At a high level, if zoomInFn has side effects, lazy evaluation will likely be undesirable, as the execution of said side effects will be unpredictable at definition site. You're right that because we can at least get rid of redundant evaluations of zoomInFn, the range of "side effects" that could cause problems is reduced. For example, if the zoomInFn callback is merely expensive to compute, that won't be a problem. If it does some logging that doesn't affect program execution, it's probably fine too. If it creates new instances that don't have structural equality like case classes, proooobably still fine, at least if nothing stateful is involved.

Naming-wise, we already use strict to indicate strict evaluation, e.g. mapToStrict vs mapTo (the latter is accepts the argument by-name, so the argument is evaluated lazily). Perhaps the existing zoom method could be renamed to zoomStrict to make room for the new lazy zoom method, but StrictSignal's map will need some other name, mapStrict would be inconsistent with other strict names, and I'm not ready to change the contract of map.

A final, orthogonal thought on API-design is that I do enjoy a post-fix pattern for method alternatives, if only because it more naturally works with autocomplete.

I must say that IntelliJ gives me all the relevant autocompletions regardless of whether the method starts with what I typed, or just contains it somewhere in the middle, so the main advantage of that strategy is kind of diluted IMO. Still, I try to follow this when the resulting method name sounds fine (e.g. zoomStrict is fine, but other times adding a suffix just doesn't read well).

kitlangton commented 9 months ago

Brilliant! Sounds like a plan 😎 I'll put my lil' demo code to work and report back if any weird side-effects crop up.

🥳🥳🥳

raquo commented 4 months ago

You can now preview this feature in Airstream v17.1.0-M1. It can be used with Laminar 17.0.0. The new method name is zoomLazy.

raquo / Airstream