michaellperry / Assisticant

MIT License
36 stars 19 forks source link

Using Assisticant with async #43

Open qwertie opened 2 years ago

qwertie commented 2 years ago

I'm interested in using Assisticant in my new async-heavy API backend, but Assisticant doesn't seem to support async at all.

So I had the idea of forcing my async code to run synchronously inside Computed<T>._update by using .Result on a task it calls. Unfortunately, this doesn't really work. My update method calls an async method that (1) asynchronously reads from the database, storing the results in a ObservableDictionary and then (2) returns ObservableDictionary<,>.Values.ToList().

Let's say my update method runs on "thread 1". It turns out that after reading from the database, the remainder of the async method runs in "thread 2" and then the .Result is returned to thread 1. Thus, the dependency is not recorded because it was accessed on the wrong thread.

Questions:

  1. Have you thought about how to support async?
  2. Can you think of a hack I could use to make this work... without having to know, in the update method, which precedents will be accessed by the async code?
  3. Do you think maybe ObservableBase might be a less confusing name than Precedent?
michaellperry commented 2 years ago

I would recommend doing this in a subscription. Define a Computed for the query parameters. It gathers values from the appropriate observables and packs them into a query parameters object. Then you can subscribe to the computed and call an async method when the parameters change. Store the results in a different observable.

Ordinarily it would be poor form to set an observable within a subscription. However, since this is async, it won't cause any issues. The observable is only modified upon completion of an async process triggered by the subscription.

I think this pattern would be best documented rather than implemented directly within the library. Computed is intended to represent a deterministic operation. What you have described here is a cache.

michaellperry commented 2 years ago

The name of the base class made more sense in Update Controls, where it was the base for both Independent and Dependent. Either of them can be a precedent for a Dependent. Given that the application developer never needs to interact with this class, I didn't feel the need to change it when moving to the Assisticant nomenclature.

qwertie commented 2 years ago

Ordinarily it would be poor form to set an observable within a subscription. However, since this is async, it won't cause any issues. The observable is only modified upon completion of an async process triggered by the subscription.

Note that the "async" parts will be deferred or synchronous depending on whether a database query was required or not, respectively. My Observables hold the results of database queries, and it may or may not be required to run a query in a given case. But I guess this could work either way, so thank you for the suggestion.

I think this pattern would be best documented rather than implemented directly within the library. Computed is intended to represent a deterministic operation. What you have described here is a cache.

I am confused by this statement, because a Computed is always a cache. That's why I'm interested in using Assisticant in the first place.

It looks to me like there is no "logical thread ID" for async tasks, as a result of which it may be impossible to extend Assisticant to support async, (Even many years ago the lack of tooling in .NET around "forking" threads was a major problem for my "Ambient Service Pattern", and with async the problem is even worse. 🙁😢)

michaellperry commented 2 years ago

The distinction I draw between a cache and Computed is that a cache does not depend solely on the input parameters. If you query a database, you provide some parameters for the where clause, and the result is the set of rows that are selected. The rows could not have been predicted based solely on the parameters. You needed to know what was in the database as well.

Computed on the other hand expects that the result is based only upon the inputs. If you pass in the same inputs, the computation will always give back the same result. It's deterministic.

Since data in the database can change, caches need to be invalidated. Computed does not. It doesn't even have that capability. As long as the inputs don't change, Computed does not call its function again.

If we needed to make async work with dependency tracking, we might be able to use AsyncLocal. That probably wouldn't help with the Ambient Service Pattern, though.

qwertie commented 2 years ago

Well in my case, the Computeds depend on the cache, which itself is a set of Observables, which are the app's source of truth (not the database). I know better than to make my Computeds depend directly on the database.

we might be able to use AsyncLocal.

Oh wow, I didn't know that existed. I am confused about how exactly it behaves, given that forking is implicit in async code. For example, I could write

var x = await Task1();
var y = await Task2();
return x + y;

or I could write

var x = Task1();
var y = Task1();
return await x + await y;

and the second one forks its control flow. I did a test based on the example on the page you linked to, and it seems like an AsyncLocal's value is duplicated if a fork occurs. I am uncomfortable that the page does not explain how this works... but if it works the way it appears to work, that's great news, as it implies that it would be possible to extend Assisticant to support async (though I probably don't have time to implement that in the near future.)

michaellperry commented 2 years ago

Sounds like it might be worth a try. Before adding it to the library, I would definitely want to see it used in a real app. And I would want to make sure we document all of the caveats of depending upon non-deterministic external sources.

When you do have some time to dig into it, let me know. And if I get to a point in Jinaga.NET where it would be useful, I'll do the same.

qwertie commented 7 months ago

Wow, I never realized that Computeds are not thread-safe! I remembered seeing the lock (this) in there long ago and assuming everything was fine. Not at all! If a thread does OnGet while another thread is in OnGet, it'll just assume there's a cyclic dependency.

It's failing in production (NullReferenceException if the second OnGet returns an uninitialized _value) so I need to come up with something quick, hmm...

qwertie commented 7 months ago

So for now I fixed it with a lock, like

    lock (_stuff) {
        var stuff = _stuff.Value;
        return stuff;
    }

I'm not sure if this is safe in general.

I also just looked at Observable<T>... it certainly seems to be intended to be thread-safe. I noticed one thing: in general, _computedArray and _computedHash (which should be encapsulated in a struct but I digress) are lock-protected, but are used directly, with loops, without locking in MakeDependentsOutOfDate. I also imagined a very slow Computed that takes five seconds to compute... what if one of the Observables that it uses is changed concurrently? Well... it looks like that would work out fine; the Computed ends up OUT_OF_DATE at the end with an empty Precedent list and _value set as expected.

Also, why in the world is HashedWeakReference a class? Shouldn't it be a struct?

michaellperry commented 7 months ago

Thanks for raising the thread safety problem. Do you think that is related to this conversation, or is that a new issue?

We might also want additional issues to convert reference types to value types where necessary. I'd like to think carefully about those.

The behavior that you describe of an Observable changing concurrently is exactly the intent. Computed throwing an exception if called from a second thread is not. Sounds like scenarios we would like to test.

Would you be able to set up a failing unit test for thread safety?

qwertie commented 5 months ago

Sorry, I just brought it up here because I lazily went back to a thread that already exists. The NullReferenceException happens in my code, since it expected the Computed<T> to return a value and it doesn't. But as I see it, the root cause is that Computed<T> does not have multithread-safe semantics. As for working on anything in the Assisticant code base, unfortunately I'm so busy that I've resorted to paying someone else to work on my own open source project.