Open dsebastien opened 5 years ago
Interesting point! Currently, this is not fully possible with the Solid spec. An app can receive notifications when it has a WebSocket open, but it's not possible to configure a pod so that it will, for instance, make an active outgoing webhook call when its content changes.
There are multiple ways to think about event streams and how they could be used:
I think that all those scenarios have benefits and are enablers.
There are of course many side questions, like how clients select the events they're interested in, how end users manage permissions for accessing those events/data.
@dsebastien Yep - supporting events is critical in any system. Quick nit-pick - this user-story has no '...so that...' section (I notice that quite a few user-stories are missing that important statement, so maybe we need to send a broader message to the group!).
Also, at the highest level of abstraction (i.e. architecturally) I'd like (if possible) to keep a discussion on events centered on Linked Data Notifications (LDN). If the current specification doesn't support what you'd like (e.g. super-hot-streams!), then we could discuss that (as an extension to the spec, or a possible LDN v1.1 suggestion).
@pmcb55: Indeed, my story is a bit open ended; the thing is that this feature request is an enabler for the imagination of developers :)
@michielbdejong @pmcb55 : I think this might be interesting to consider in the context of that discussion: https://github.com/inrupt/wac-ldp/pull/36 and https://github.com/inrupt/wac-ldp/issues/31.
I'm not familiar with Linked Data Notifications, so I have some reading to do, but from what I understand from the introduction, I think that it only covers a small subset of the idea that I'm trying to describe/propose here.
What I'm actually proposing is to integrate event sourcing right into the core of Solid, ideally right into the data storage of the pods.
By doing so, we can systematically generate events to track/store anything happening in and around pods (e.g., folder created, file contents changed, pod created, permissions changed, etc etc etc). Event sourcing enables reconstructing the whole history of the pod by reapplying the events in the order that they've happened.
If we think about classic RDBMS systems, which usually only store the current data set, then we can see that as limited because it means that whatever change we make to the data makes us loose its previous state.
EDA and event sourcing enable keeping track of the whole history of the data and to easily go back in time if and when needed. In addition it also makes it easy to create reactive systems around the event store by letting apps/agents access and process those events, either live or not (hot vs cold).
I think that Linked Data Notifications are just one interface that can use the event log as input to notify the outside world, but it might actually not be the only channel over which we can expose the event log.
I'll try to create a sort of visualization of this idea if it isn't clear, I've quickly written this :)
One thing that we'd need to consider is the storage impact. Keeping the history of changes will increase the storage required for pods. Maybe we can look at this aspect as follows:
By default we could store all events regarding the life of the pod (acl changes, data access, keys added/removed, ...) along with all pod data changes (e.g., file/folder added/changed, linked-data changes, etc).
But also provide end users with a way to control the data versioning and auditing schemes, which they could decide to do for privacy reasons (e.g., I want to be sure that my actions are not traced inside my pod) or to limit their storage use.
Great ideas! One side-remark, even if we make sure the change events contain all the necessary information to replay those events from the log (a bit like mysql binlog is used for master/slave replication and indeed for restore-from-backup), we would also need a rewrite service that rewrites history to a shorter form (basically, only the INSERTs that are needed to get to a snapshot of the contents). I don't think we should make this part of the spec, even if we would make outgoing webhook support part of it. The idea of journal-style persistence works well with the request for Memento support, but I think it's really a per-server implementation decision.
Yep @dsebastien - I totally agree with baking EDA into the very core of Solid. In fact, I've written very extensively on this point (not published, just internally for inrupt). But just to clarify a little, LDN is founded on Linked Data/RDF (as is Solid of course), meaning an LDN 'event' can basically record 'anyone saying anything about anything' - so that's why I suggest it as a lowest-common-denominator for eventing for Solid.
added to readme
Opening this as part of moving all user stories into issues again.
If we design the Solid server as an event-driven system using event sourcing, then we could enable a wide range of scenarios. Event-driven architectures are very powerful in that they allow to create loosely coupled reactive systems.
For example, agents that are mandated by end users to manage parts of their data (e.g., organize documents, send automated notifications for upcoming calendar events, etc) and authorized to access a specific stream or event type could watch the user's pod for relevant events and react to those.
If we get that in from the start, then we can construct a registry of commands/events and expand that over time.
Event streams can be cold or hot. Hot streams are live ones: you only get the events if you are listening when they are published. Cold ones are replayable, meaning that you can get events published earlier on even if you were not listening at the time they were published.
There are a ton of resources around event-driven architectures (EDA), event sourcing and CQRS. Here are some of my notes on the subject: https://dsebastien.gitbooks.io/software-architecture-notes/content/event-driven.html.