solid / specification

Solid Technical Reports
https://solidproject.org/TR/
MIT License
490 stars 45 forks source link

Solid-Plugins: pod WASM runtime interface for plugins, running jobs unfit for browser client #192

Open joepio opened 4 years ago

joepio commented 4 years ago

One of the core concepts of Solid, is having all application logic in the client (browser), whilst the Pod mostly serves as a DB + Auth machine. This model works fine for many types of apps, but it leaves some usecases unsolved: when the back-end needs to do jobs on its own.

For example, assume I want to build a chat app (think Slack) that works with solid pods. It stores not just my own chat messages, but also the messages of my team. If I want to link it to other services (say, github issues) I might need some webhook functionality.

It would be really nice if I could write and add plugins to my solid pod that function as background processes, or perhaps middlewares. This would give developers an extremely powerful API to build any type of app.

However, it is essential that these plugins run in a sandboxed runtime - they should not (at least not by default) have access to all user data, the disk, or even the internet.

Luckily for us, two new technologies are emerging that help to achieve this: WASM (Webassembly, a binary format that runs everywhere) and WASI (a universal system interface that makes it possible to let various programing languages talk to each other and access system features). Several runtimes already exist (Wasmtime, Wasmer), and they provide APIs for various programming languages that could function as a host.

How could this work? Well, the plugin itself can be written in any language that compiles to WASM (which is a rapidly growing list). The plugin module might use some functions exposed by the host (your pod), for example a Triple Pattern Fragment function that returns some statements. It might also import some callbacks, e.g. onResourceChange, which is called every time a resource changes. This enables middleware-like patters. You could use this to add functionality to your pod, such as a custom notification system, or a fulltext search interface. A plugin might also register a couple of routes for your pod, which the Pod needs to route to.

Some pseudocode that shows how a plugin might look like:

// This function has some custom validations that are run for new BlogPosts
function OnResourceWillChange((store, Resource) => {
  if isABlogpost(Resource) && isValid(Resource) {
    return Ok
  } else {
    return Error
  }
});

// Or a callback that you can use periodically, to scrape content or build some index for something external
funtion OnRunPeriodically('day', (store) => {
  const newthing = fetchSomeAPI();
  store.add(newthing);
})

The plugin will be compiled to .wasm, and will be referred to some RDF resource, for which we will need an ontology.

@prefix runtime: <https://w3c.org/solid-runtime/>.

:myplugin a runtime:Plugin;
  schema:name "CoolioPlugin";
  runtime:path "/coolioplugin";
  runtime:wasmBinary "https://example.com/plugin.wasm";

Perhaps this is a bit unconventional, and perhaps it is not the solid-way of doing things, but I think it has some interesting merits:

I think it might make sense to standardize an interface just like this, although it might need to be a separate spec from this one. Wondering what you think of this idea.

acoburn commented 4 years ago

Hi @joepio! if I am not mistaken, it is possible to do what you propose right now, without modifying the solid specification (or any other specification). What is required is an identity provider that can issue refresh tokens. Here is how that works:

  1. A user delegates access to a bot via an OIDC authorization_code flow, permitting the "offline" scope. That interaction may be a mix of CLI and Web-based, or it may be entirely Web-based. (This is exactly what happens when you grant a third party access to a GitHub repository). That bot is now in possession of an access token and a refresh token, though as a user, you are able to revoke that refresh token at any point.
  2. The bot can now use that access token to interact with your Solid Pod. That interaction can happen at any time, on any schedule: hence the "offline" scope (this refers to the user being offline). When the access token expires, the bot can use the refresh token to request a new access token. So long as the user does not revoke access, the bot can continue in this way indefinitely.

There are certainly security considerations for any implementation, as it would be incumbent on the bot implementation to store these access tokens securely.

In other words, there are standard protocols that already support non-browser-based interactions, which means that this would be entirely language agnostic. I have written code that does exactly this in Python and which runs on the command line, but if you prefer WASM or Haskell, that would be fine, too.

joepio commented 4 years ago

Thanks for the reply @acoburn!

That is certainly possible, but I do think that having an interface running inside your own server is more privacy friendly and performant. I'd like to think of my Pod as a VM that I as user control, which I can control and modify from a simple GUI in my browser, by visiting my pod. When plugins are run externally, I either have to trust other parties with my data, or I have to set it up myself. I'm also a bit skeptical about whether the OICD approach is realistic for middleware-like behaviors (e.g. performing server-side validations).

Perhaps some more concrete use cases will need to be presented before it makes sense to define a new API.

joepio commented 4 years ago

Being able to execute arbitrary code in some sandbox with a standardized API on a pod instance could be a proxy for many other problems, and could contribute to the overall modularity of the solid project.

I think the true question we're asking is this: is a Solid Pod more like a database, or more like an operating system? If it's a database, we'll need external services to provide all sorts of features. I think I want my pod to feel like a personal computer that can be accessed remotely, running the plugins (apps) that provide me features that I already have on my desktop.

Here's a list of usecases / features that require server side logic to work. These could be implemented in specific pod applications, but that would also mean that they are probably only usable in that specific pod. So ideally, these could be implemented as plugins on some standardized interface, which means that they can run in any pod.

  1. Versioning / history. I want to save every single change that I make to a resource, and be able to revert every single decision. This could be implemented using some onResourceUpdated middelware. Relates to standardizing state changes #161.
  2. Audit logging. I want to be able to see when someone viewed a piece of data. So I'll need to create Audit Log resources on perhaps every request.
  3. Full-text search. Implementing this means rebuilding the search index when a resource changes, and it also means providing some sort of endpoint with a convenient API.
  4. Privacy friendly on-pod querying. Let's say a researcher want to know how much distance people travel. They could import all location history from pods, but they're only interested in the distance travelled. Let's assume people only store timestamped geocoordinates on their pod. How can this researcher get the answer to his distance query? He could ask for all location resources and execute his logic on his own server, but imagine he has this WASM runtime... He could send the algorithm to the pod, and the pod could respond with the result. That is not only way more performant (saves a ton of bandwith), but it's also more privacy friendly.
joepio commented 4 years ago

Just had a nice discussion with @RubenVerborgh on this topic. We both agreed that the runtime API should mimic existing Solid HTTP interfaces - even if they might use some highly optimized non-HTTP API in a runtime. By mimicing, the interface remains transparent and consistent.

In practice, this might mean that a onResourceUpdated handler / middleware, functions with the same options as linked data notifications.

This also means that any app running inside a Solid-Plugin runtime could also run on any other system - it would use the same interface, although it would probably use HTTP variants instead of more locally optimized stuff.