Closed brooksmtownsend closed 3 years ago
Changing the shape of the cache is only a problem for durable caches. If we change the shape of the cache all we really need to do is restart NATS (or purge the stream if you're using a disk one).
At first glance, option 1 looks nice, but I think there's some problems with it. In the scenario where there's a Redis provider operating on the default
link name and a Cassandra provider operating on the default
link name, if we simply blindly delve into the claims cache, even if we have the contract ID, we won't be able to tell which of the providers is the right one. The only real source of truth here is the link definition, which is actually how 0.18 did it.
What I think we should do is search the link definition cache for the contract ID and link name bound to that actor, and then, if we find such a link definition, that gives us all the information we need to construct the outbound topic. If such a provider is offline, then the call will time out.
Given that all we really need to do is find a contractID + link name + actor ID in the link definitions cache, fixing this problem should be fairly straightforward.
BTW thanks for digging into this. :1st_place_medal:
This is the report from the spike #148, where a KVCounter actor cannot invoke the
KeyValueAdd
operation with awasmcloud:keyvalue
provider running on a different host.The cause of this issue is actually fairly simple, it takes place on line https://github.com/wasmCloud/wasmcloud-otp/blob/main/host_core/lib/host_core/web_assembly/imports.ex#L156
TL;DR : When determining the target of an invocation, if it is a provider we query the local provider table (which only contains providers running on that host). This is because at the invocation level the namespace is the contract ID, and the binding is the link name. We don't want to hardcode ourselves into invoking a provider public key, so this is by design.
The fix however could be complicated. Essentially, when invoking a provider we need to know its public key so we can publish to the correct NATS topic for invocations. If it's a provider running on the local wasmcloud host, then it's a simple cache lookup. If it's running on a remote wasmcloud host, then we have to have a way to know the public key of a running provider based on the contract ID and the link definition.
I see a few possible resolutions, and looking for some clarity on them (cc @autodidaddict @stevelr)
If I'm correct with #1, then it should be fairly simple, but the shape of the claims map in the cache will change so it might take a bit of care to implement.