solid / specification

Solid Technical Reports
https://solidproject.org/TR/
MIT License
488 stars 45 forks source link

Server-side scripts registration #390

Open NoelDeMartin opened 2 years ago

NoelDeMartin commented 2 years ago

I know this is a long shot, but I thought it'd be interesting to get the conversation started about running scripts in a POD.

So far, I've been building apps entirely on the client, and that's worked for my use-case. But I can see a point where I'll need to run some code in the server. The current solution for doing something like that is either to use a purpose-built POD with additional functionality, or to use a non-Solid server to perform the computation and communicate with the POD like any other client. Whilst both solutions can be viable, I think it would be great if there was something in the protocol to handle these use-cases without relying on special PODs or a centralized server. One way to achieve this would be to allow registering scripts on the POD.

For example, imagine that I'm building an RSS reader in Solid. I could upload a file /scripts/rss-update.js to my POD and configure it to run once a day. This script would fetch all the RSS sources and update the related documents. It could be configured using something similar to unix cron jobs:

@prefix solid: <http://www.w3.org/ns/solid/terms#> .

<#job-1>
    a solid:CronJob ;
    solid:schedule "0 0 * * *" ;
    solid:script </scripts/rss-update.js> .

Another idea would be to run scripts whenever some resources change. I started thinking about this looking at @srosset81's presentation in last month's Solid World. This is what ActivityPod's bots could look like:

@prefix solid: <http://www.w3.org/ns/solid/terms#> .
@prefix as: <https://www.w3.org/ns/activitystreams#> .

<#events-bot-1>
    a solid:Hook ;
    solid:on solid:ResourceCreated ;
    solid:resourceType as:Invite ;
    solid:script </scripts/update-event-acls.js> .

<#events-bot-2>
    a solid:Hook ;
    solid:on solid:ResourceCreated ;
    solid:resourceType as:Join ;
    solid:script </scripts/update-participants.js> .

<#events-bot-3>
    a solid:Hook ;
    solid:on solid:ResourceCreated ;
    solid:resourceType as:Update ;
    solid:script </scripts/notify-participants.js> .

<#mailer-bot-1>
    a solid:Hook ;
    solid:on solid:ResourceCreated ;
    solid:resourceType as:Invite ;
    solid:script </scripts/email-invitation.js> .

<#mailer-bot-2>
    a solid:Hook ;
    solid:on solid:ResourceCreated ;
    solid:resourceType as:Join ;
    solid:script </scripts/email-participant-joined.js> .

In any case, I know there are many questions and concerns that arise from doing something like this. But I also believe it goes in line with Solid's vision, an even if this isn't going to happen any time soon I thought it'd be interesting to discuss.

Here's some open questions I have:

renyuneyun commented 2 years ago

That's a nice proposal! I have been thinking of something related. I guess I will make a separate proposal, which may be bundled as delegated computation. Nevertheless, there are a lot of commonalities shared between them.

By scripting, I can think of two types of them, but feel you refer to the second case more?

  1. Plain scripts -- programs written in scripting languages, with some help to retrieve data from user pod. This is basically free-form code.
  2. API-based scripts -- programs written in supported languages, but must make API calls to require server resources (incl. data, communication, etc). I was thinking of something like Google App Engine (GAE), but probably AWS Lambda is a better example (sorry I do not have previous knowledge of AWS Lambda).

Some thoughts regarding the questions you raised:

  1. I guess "abusing" means abusing the server's computational resources (etc). I feel this is a challenge for all these types of work. Using a resource monitoring mechanism similar to GAE or AWS Lambda, or borrowing some ideas from smart contracts (e.g. using gas as did in Ethereum) could do the trick.
    • I guess the Solid server does not want to support heavy computation, so the resource constraint will be very limited.
    • This is where the delegated computation can help -- not only the Solid server could do the computation, but also other compatible servers. Pod owners/Users (or Apps?) choose which compatible server to perform the computation. Thus, the Solid server can choose to not act as a compatible server of delegated computation and require the user to use other servers. Of course, that involves the problem of trust and privacy.
  2. The server must implement correct isolation / permission management of users' pods' data, otherwise the script may (un)expectedly access/change them. A better approach would be to isolate the script completely, otherwise the server's own (system) files may be leaked. But apart from them, there should not be security concerns (but may be privacy concerns; related to item 5).
  3. If it's plain scripts, then maybe any language that the server has interpreters of? If it's limited to API-based scripts, then maybe any language is in an equivalent position, because the relevant APIs/libraries all have to be implemented? JS has a slight advantage -- not requiring data transformation across languages; but I feel this is not a very large difference?
  4. I do not have enough experience to compare in this aspect.
  5. I guess a user should have this information in its pod setting (e.g. /setting/prefs), or along with the App authorization part. The configuration could be similar to WAC/ACL terms, and with more terms related to resource constraints (e.g. timeout). Basically, a script is not very different to an App if it only runs periodically, and does not interact with other things (e.g. Apps).
    • But this is not a complete story, if they begin to interact with other things. For example, Solid Calendar Store provides additional (HTTP) API endpoints for a Solid server, to allow users to retrieve calendar information from various sources; it also has a companion Solid App (KNoodle) to use these APIs. The "fetching calendars from various resources" is very similar to the "RSS aggregation" you illustrated, so one may naturally expect to provide just one additional ability: to expose API for the resource. (Note one difference: the API endpoint is not just a piece of data stored in the pod; it provides real-time / dynamically generated data.) Then, it comes the problem of authorizing the API endpoint (or any other data) generated by the computation. The authorization could be done in WAC/ACL, but then additional mechanisms are needed to identify what resources need the WAC/ACL. I would imagine a dynamic policy mechanism would be better (my research is related to this).
NoelDeMartin commented 2 years ago

Nice points @renyuneyun :).

Using a resource monitoring mechanism similar to GAE or AWS Lambda, or borrowing some ideas from smart contracts (e.g. using gas as did in Ethereum) could do the trick.

Yeah I think that type of thing would be nice. GAE or AWS Lambda have inherent protection to abuse because the computation has to be paid, so it cannot be abused. Something like Ethereum's gas or a quota would be nice, and maybe this would even be a viable business model for POD providers. Starter accounts would offer a small quota, and you'd be able to increase it by paying more.

If it's plain scripts, then maybe any language that the server has interpreters of?

I think this is very tricky, because the whole idea of adding this to the spec is that all PODs support this. And the spec doesn't require PODs to be written in any particular language, that's an implementation detail.

The configuration could be similar to WAC/ACL terms.

Yeah I agree something like that would make sense. I was worried about over-complicating permissions for users, but I guess if it can be simplified to which data the scripts have access to (this is already what authorization is supposed to do) + how much computation quota they are allowed, it should be pretty easy to understand (all things considered).

a script is not very different to an App if it only runs periodically, and does not interact with other things (e.g. Apps)

I guess this is where the concept of "app" becomes a bit blurry. In my mind, a Solid App is something 100% client-side. The only reason why apps need their own server, in my opinion, is the limitations we're discussing here. At the end, there is ideally a single source of truth and that is the data in the user's POD. So a script should never need to interact with another "app", because an app is only an interface to the data for people's use, not machines.

However, I know things are not as easy in practice. But I think it should be fine if the current mechanisms (like web sockets) continue working when scripts modify the data. After all, it should already be possible for two apps to be working on the same data. In that case, if the data is modified by another app or a server-side script should be no different.

acoburn commented 2 years ago

There exists a special case where that service runs on the same server as the one that hosts your pod ("running on the pod"), but there is no inherent necessity to tie those two together.

This is a very important point. When designing systems, one can optimize for computation and one can optimize for data I/O (among others). By coupling these two elements, you force a system to optimize for both, when in fact it is often better to be able to partition the two. An analogy with some of the large cloud vendors is exemplified in how Storage is almost always separated from Compute.

renyuneyun commented 2 years ago

Nice points @RubenVerborgh and @acoburn :) I was thinking whether I made a wrong thinking direction earlier. It now seems I generally didn't go wrong in #393, which (hopefully) addresses the flexibility suggestions here, while preserving the main ideas in this proposal. Warning: quite lengthy, in terms of an "issue" on github.

NoelDeMartin commented 2 years ago

re: @RubenVerborgh

it's not either client or server

Can you elaborate a bit on that? I'm not sure I understand.

For me there is a clear distinction between client and server. The client is the device where users are running the app, and the server is the computer (or computers) hosting the POD. I know there are Solid apps running on servers, but those don't rely entirely on the Solid protocol and have a centralized dependency.

When I say "client" I mean persons using the app, not bots or server-side scripts. Maybe that's not the technical definition of a client then, in which case I guess that's what you meant? That a "client" could be a server-side script or a bot for example?

The case you describe, can run perfectly on a service. We could imagine that there are many such services in the network that allow you to register such scripts, and those services can access pods.

I agree it would be possible to accomplish what I'm describing with services, but this also falls under the umbrella of using specialized servers.

For example, if I make an app that only relies on the Solid protocol, users can log in using their account from any POD. That could be a POD hosted in solidcommunity.net or it could be a POD running in a local network (in which case, they don't even need to have internet access to use the app). However, if the app relies on some service, this creates another dependency that is outside of the app/POD combo.

I think that's a problem because this new dependency either becomes a centralized dependency (if the developer hard-codes the service url), or yet another choice for users to configure (it's hard enough to make them choose a POD provider).

NoelDeMartin commented 2 years ago

Thanks for clarifying @RubenVerborgh.

In that case, I guess the term "client" does not encompass what I'm trying to say; I'm not sure what's the technical term for that (if there is even one). End-user device? The scenario I'm trying to describe is any given person who's not technically savvy and doesn't own a server nor has the ability/inclination to install anything other than an app (in this case, what I was calling "client": the Solid app).

What I would like is the ability to handle use-cases such as the ones I described initially relying only on the Solid protocol. If I have to rely on anyting else than the user having a pod and a device to run my app, that's a bummer.

But yes, I agree that Solid apps don't have to be like that, and those limitations don't really exist. That's just what I would like to see in an ideal world. Although I know how difficult it is to integrate something like this in the protocol, I just want to get the conversation started.

In any case, I think we agree in spirit, I'm just not expressing the technicalities correctly :).