wasmCloud / capability-providers

ARCHIVED: wasmCloud pre-1.0 capability providers. See up-to-date capability providers in the main repository, https://github.com/wasmcloud/wasmcloud
https://github.com/wasmcloud/wasmcloud
78 stars 34 forks source link

[RFC] Change Provider Interface to Support Polyglot Capability Providers #75

Closed autodidaddict closed 3 years ago

autodidaddict commented 3 years ago

Change Provider Interface to Support Polyglot Capability Providers

Summary

Today’s capability providers can only be implemented in Rust. Because of the way the traits and plugins are implemented, the (undefined) Rust ABI is either difficult or impossible to support or mimic from a provider written in a preferred cloud native language like Go or Java. Our intent is to remove the dependency on the Rust (undefined) ABI, plugin model, libloading, and the C runtime required by capability providers by changing the communication protocol between provider and host runtime from proprietary to gRPC.

Motivation

The developer exposure to the wasmCloud ecosystem is through either actors or capability providers, or both. If a developer is trying to build an actor that needs to do something that one of our first party providers doesn’t do, they’ll have to create their own provider.

Creating a new provider today is a very high-friction, manual process that is only supported by Rust code and, further, only supported by specific versions of Rust crates that tightly couple the plugin to the wasmCloud ecosystem.

Our motivation is clear - enhance the developer experience and reduce friction by making it as easy to build a capability provider in Go or any other language as it is to build one in Rust.

Design Detail

Today’s capability providers must implement the following Rust trait:

pub trait CapabilityProvider: CloneProvider + Send + Sync {    
    fn configure_dispatch(
        &self,
        dispatcher: Box<dyn Dispatcher>,
    ) -> Result<(), Box<dyn Error + Send + Sync>>;

    fn handle_call(
        &self,
        actor: &str,
        op: &str,
        msg: &[u8],
    ) -> Result<Vec<u8>, Box<dyn Error + Send + Sync>>;

    fn stop(&self);
}

There are a number of things happening in this contract that limit us to only being able to use Rust. We are passing long-lived pointers between the host and the provider, we are passing complex types and essentially doing everything we can to stray into the realm of non-standard FFI communications and undefined ABI.

We propose a new means by which capability providers communicate with the host runtime--gRPC. It took a few discussion steps to arrive at this conclusion. We started with the notion that we needed a way to communicate that would work on all operating systems in all environments. This kept us from using old favorites like named pipes. We then figured that the next best thing might be to use sockets, since those are available everywhere--their ubiquity seemed ideal for our needs.

Then we looked at what we would need in order to facilitate clear, bi-directional communication between the host runtime and providers. It didn’t take long to realize that if we did this on our own on top of TCP, we would be re-inventing the "gRPC wheel" and doing a poor job at that.

The core of this new idea is that rather than capability providers being plugins that are required to conform to Rust’s expectations, we simply communicate with the providers via gRPC using a service called CapabilityService. A capability service is a service that supports the Invoke function that supplies an invocation and returns an invocation response. These are the same data structures that are used today in internal dispatch, only converted to gRPC as described in the following protobuf IDL (this is not a final rendering of the protocol, as there will likely be a modeled security handshake, the specifics can change, etc. This is only for illustration):

syntax = "proto3";

package wasmcloud.capabilities;

message Invocation {
    string id                 = 1;
    WasmCloudEntity origin  = 2;
    WasmCloudEntity target  = 3;
    string operation          = 4;
    bytes data                = 5;
     //JWT-encoded signed anti-forgery token
    string aft              = 6;
    string origin_host_id   = 7;
    bool async = 8;
}

message InvocationResponse {
    string invocation_id    = 1;
    string error            = 2;
    bytes data              = 3;
}

message WasmCloudEntity {
   oneof entity {
        Actor actor             = 1;
        Provider provider    = 2;
   }
}

message Actor {
    string public_key   = 1;
}

message Provider {    
    string public_key   = 1;
    string contract_id  = 2;
    string link_name    = 3;
}

service CapabilityService {
    rpc Invoke(Invocation) returns (InvocationResponse);    
}

Another key characteristic of this design is that the means by which we communicate with a capability provider may not directly tied to the means by which the provider is started, loaded, instantiated, etc. As long as the provider exposes a gRPC endpoint and we have point-to-point connectivity, the host and provider can communicate.

Starting and Stopping Capability Providers

In this new polyglot-supporting gRPC model, we now have two different kinds of capability providers:

A local provider is a capability provider that is a native executable that the wasmCloud host spawns as a child process, giving it information on how to connect back to the host via handshake protocol. A remote provider is essentially a set of gRPC URLs (client-side load balanced) that represents a pre-existing service that is compatible with the wasmCloud capability provider protocol illustrated in the preceding code block.

For obvious reasons, wasmCloud can only stop and start the child process type of provider. It will connect to and disconnect from remote providers, and the rest of the ecosystem and code will be oblivious to whether the capability provider is a spawned child process or a remote (externally managed) process.

Drawbacks

The first and most obvious drawback to this option is that we would have to refactor, recompile, and re-release all of our first party capability providers. If there are third party providers in the wild, those too would have to be upgraded in order to work with the new runtime host.

Another drawback is that the use of gRPC could be confusing. wasmCloud actors and providers communicate with each other using their own binary payload format, agreed upon out of band. Today this format is messagepack, but that's a fungible implementation detail. Developers could be confused by the fact that they are now using protobuf “envelopes” over gRPC to deliver the opaque binary blobs that represent the raw data needed by providers. This lets some of the envelope strategy “escape” onto the wire whereas today those envelopes are mostly internal implementation details.

Rationale and Alternatives

This design insulates and decouples capability providers from the internal implementation details of the wasmCloud host, no longer requiring tight coupling, shared data, a common programming language, or shared async pattern usage. This design insulates capability providers from the actix runtime and from the burdens of Rust async. Developers are free to use their own threading models, executors, data structures, memory models, etc.

Another design that we considered was the "stdout service” model that the protocol buffer compiler (protoc) uses. This is a scenario where a “plugin” is actually an executable that is started. The input parameters are written to the new process’s stdin while the responses are written to its stdout. This approach felt very brittle and there is also a significant performance degradation when going from sockets via localhost to piping stdout/stdin.

If we do not make this change to future-proof our capability provider system, we will be doing the ecosystem and developers a disservice by continuing to make it difficult to create new providers and difficult or outright impossible to create new providers in languages like Go. In short, we’ll actively be discouraging the community from writing their own capability providers, while capability providers are designed to be community-built.

Prior Art

We have already discussed plugin mechanisms that work like protoc, where executables are started and their stdout/stdin pipes are used for communication.

Another option that is used quite a bit is named pipes/unix domain sockets. These are specifically designed for interprocess communication (IPC). They are fast and ubiquitous, however they still require both sides of the contract to agree upon a protocol for bi-directional communication, and support for these can vary quite a bit from operating system to operating system.

Go has a native plugin package that is used quite a bit for work like this, but it only works for Go that loads Go code, and as such doesn’t help us at all in terms of polyglot support and giving developers freedom of choice.

Unresolved questions

autodidaddict commented 3 years ago

The information contained here on how we might implement this is now officially out of date. We're no longer thinking that gRPC is the way to go (for a number of reasons which can be explained in a subsequent comment), and are instead opting for NATS.

The new strategy is quite simply that the capability providers subscribe to their relevant topics and respond to them directly, leaving the host core as the proxy to step aside. This dramatically reduces the number of conditional code paths, simplifies interactions, and makes it easy to write capability providers in any language, so long as that language has a NATS client.

brooksmtownsend commented 3 years ago

@autodidaddict @stevelr We've had a few discussions about this in Slack, and we have decided for our MVP release that using standalone capability provider binaries and direct NATS messages is the way that we are going to support polyglot capability providers. This work is going to be done over multiple issues/PRs to upgrade our capability providers, we should document the process of upgrading our current providers in our documentation in order to help others do the same if they have existing providers.

I'm going to close this because I feel like the RFC has served its purpose and the work is going to be tracked elsewhere, but reopen if you disagree