When a cluster is set up to run an application, there's typically also some shared infrastructure such as databases or message brokers that allows transferring data between different application server nodes in the cluster. It might be easier to start using Collaboration Engine for a clustered application if it would be possible to reuse this existing infrastructure instead of having to deploy a separate standalone Collaboration Engine server specifically for transferring collaboration data between nodes.

Rather than trying to find a universal data model that would be directly supported by a wide range of potential technologies, we would instead implement our own data model and use the underlying backend only for distributing change messages. A log of old messages would also be kept in the shared infrastructure so that new nodes can reconstruct the current state. Regular snapshots could be created to reduce the number of messages that need to be processed to catch up, and old messages beyond the most recent snapshot could optionally be discarded to reduce the footprint. Collaboration Engine would basically be an event sourced system that can use a wide range of technologies as an event store.

With this model, the requirements on the backend would be:

Messages are distributed to all subscribing application server nodes.
Total ordering is ensured for messages belonging to the same topic, whereas no synchronization is needed between messages belonging to different topics. Total ordering means that all nodes that receive two messages will receive them in the same order so that all nodes will have a shared view of which message is the "winner" in case of a conflict. The alternative would be to base the data model on CRDT or OT, but this would require compromises that are not appropriate for Collaboration Engine.
Old messages are retained for new nodes to catch up. Nothing special is needed for snapshots since they can be retained as messages posted to a helper topic.

This model would make it possible to use a wide range of different solutions. The most demanding requirement might be to ensure total ordering since that requires either a single point of synchronization (and thus also a single point of failure) or a more complex architecture based on e.g. quorum. Most infrastructure candidates do still provide some kind of transactional mechanism that can be used for this purpose, with the main exception being JMS that doesn't make guarantees about ordering of messages from different senders. JMS in itself is however also unsuitable since it doesn't offer any direct way of preserving old messages for new subscribers to catch up (durable messages are only preserved for pre-existing subscribers), but JMS can be combined with any transactional database to get both ordering and retention for messages.

We could thus enable using Collaboration Engine with a backend based on any of these types of technologies:

Embeddable cluster solutions such as Akka or Hazelcast IMDG.
Event-firing transactional databases and in-memory data stores/caches such as MongoDB or Redis.
- In particular, anything implementing JSR-107 could be used since the event log can be built up as a linked list of cache entries by using Cache::replace to atomically update a head pointer.
Event stores or event streaming tools with event retention, such as Kafka or EventStoreDB.
Any transactional database that doesn't fire events, in combination with JMS to send change messages.

This is an initial draft of what the backend integration API might look like

public class CollaborationEngineConfiguration {
  public void setBackend(CollaborationEngineBackend backend);
}

public interface CollaborationEngineBackend {
  // internalTopic is set for topics used internally by CE to prevent naming conflicts with the application's own topic names
  TopicMessageLog openMessageLog(String topic, boolean internalTopic);
}

public interface TopicMessageLog {
  void submitMessage(String payload);
  // Implementing catch-up might be complicated with some backends. We could provide a helper that separately subscribes and loads old messages while automatically taking care of ordering and possible duplicates while catching up
  Registration subscribe(Consumer<Message> callback, String messageIdToCatchUpFromExclusive);
  // loadMessages(null, null, 1) would load the most recent message, which is useful for finding the latest snapshot
  // loadMessages(null, oldestFromSnapshot, snapshotInterval * 2) could be used for manually catching up
  // A non-null first parameter would be used only for lazy loading further into history, e.g. if we in the future use historical data for implementing a lazy loading list or for offline synchronization
  CompletableFuture<List<Message>> loadMessages(String newestMessageIdOrNull, String oldestMessageIdOrNull, int limit);
  void close();

  // Optional operations - can be no-ops at the expense of infinitely accumulating data
  void discardOlderThan(String newestMessageIdToKeep);
  void discardTopic();
}

public class Message {
  // Message id is an opaque identifier that the backend can define in whatever way is appropriate as long as it's unique within each topic.
  private final String id;
  private final String payload;
 // + constructor and getters
}

There are also some opportunities for some more optional features in the API:

Store snapshots in a more efficient way. Without this feature, Collaboration Engine would use internal topics for managing snapshots which means that change notification messages would be sent even though nobody would be listening.
Notify when other clients join or leave the backend. This is necessary to clear stale presence entires for users if the node that they're connected to leaves the cluster abruptly. Without events from the backend, CollaborationEngine would store heartbeat timestamps in an internal topic and rely on other nodes to check for expired heartbeats.

From the application developer's point of view, usage would be as simple as this, using the Jedis client for Redis as an example.

ceConfig.setBackend(new JedisBackend(redisServerHostname));

Regarding access-control restrictions being applied to the topic, instead of the domain objects:

Security

Consider the following scenario:

Deployment: Multitenanted environment, deployed on single VPC. Multitenanted strategy is tenant discriminator column (no tenant database/schema isolation)--common for SaaS services that seek to minimize cost/tenant.

import nonapi.io.github.classgraph.json.Id;
import org.springframework.stereotype.Service;

@Entity
public class Person {

  @Id
  private Identifier id;

  @Column("first_name")
  private String firstName;
}

@Service
@Transactional
@PreAuthorize("isFullyAuthenticated()")
public class PersonService {

  @CurrentTenant // internal persistence context selector
  @PersistenceContext
  private EntityManager entityManager;

  @Transactional
  @PreAuthorize("hasPermission('#userId', 'io.sunshower.core.security.Person', 'WRITE'")
  void updateFirstName(Identifier userId, String firstName) {
    Person person = entityManager.find(Person.class, userId);
    if(person != null) {
      person.setFirstName(firstName);
    }
  }

  @Transactional
  @PreAuthorize("hasPermission('#userId', 'io.sunshower.core.security.Person', 'WRITE'")
  public void save(Person person) {
    entityManager.merge(person);
  }

}

Suppose that security policies are applied at the PersonService level (typical), if the existing CollaborativeBinder API is generalized to multiple nodes via an external mechanism such as Ignite, HazelCast, Redis, etc. then it may be possible to send edit operations to the distributed topic that modify the state between when updateFirstPerson and save (e.g. where access control policies are applied).

For instance:

Alice(id:1) (Authorized, Authenticated) has firstName legitimately set to Alice. Update is published on /person/1
Bob (Intruder) publishes (op: setFirstName, type: Person, payload {id:1, firstName: 'Bob' } to /person/;
Alice calls (save())), value firstName=Bob is saved to Alice's user entity (last-write wins)

This scenario holds for any situation in which Bob can legitimately access the messaging system used, which is typically much less rigorously restricted than the database (here on a separate subnet, all access controlled through Spring Security and DB-level security configuration). This has the practical effect of allowing users to promote themselves within the system to any conceivable role in any tenant by breaching one of the typically least-important and secured systems in a SaaS environment.

To rectify this situation, each distribution mechanism (Redis, Ignited, etc.) must support the totality of security features of the final durable store (uncommon), and the access control policy declarations must (probably) be replicated to the distribution mechanism.

Alternate Solution 1:

Instead of attaching the binder to a domain entity, extend it to accept a service with the appropriate signature/convention so that modifications must pass through the existing security stack:


CollaborationBinder binder = bindingContext.bind(Person.class).via(PersonService.class); //updates to person are made via PersonService, created by Injector instance

There will be a performance penalty here as each operation is validated against the security context, but secure and correct are more important than fast.

I suspect a multi-layer approach might be necessary for access control.

A solution based on CollaborationBinder has the obvious limitation that it only applies when using CollaborationBinder but now when using e.g. CollaborationMessageList or some low-level data-centered API. The binder case is on the other hand the most dangerous one since it offers a quite unexpected route straight into the business data that is at the core of most applications.

Specifically for the multitenant case, I believe one quite useful protection boundary could be the CollaborationEngineBackend implementation. Just like the EntityManager is supplemented to be aware of the current tenant, you would also have a CE backend that is aware of it. Any invocation of openMessageLog would record the current tenant as part of the TopicMessageLog instance that it creates, and this instance would subsequently reject any messages from the backend if they aren't sent by a user belonging to the corresponding tenant. Something like this might also be needed to avoid namespace conflicts so that multiple tenants could use e.g. person/1 as the topic id in the Collaboration Engine API and they would then be separated (e.g. as tenant1/person/1) in the backend (they would also need separate CollaborationEngine instances with the way it's currently implemented, but that's a different discussion). And this does of course not stop someone trying to elevate their own permissions within the tenant that they belong to.

The binder example does actually also have another problem that doesn't even need anyone with malicious intent to potentially cause problems (though malicious intent could certainly make it worse). The problem is Alice's colleague Charlie who has the same permissions as Alice. Charlie is also opening the same form and changes streetAddress to 1 Charliestreet and then takes a moment to check his notes to find the right value to enter for zipCode. In the meantime, Alice triggers does save() because she has updated firstName and didn't notice that Charlie had an incomplete edit in another part of the form.

One solution for this problem is that if the there are pending changes from other users, then the Save button wouldn't immediately do save() but instead show a dialog asking the user to review a snapshot of the data that is about to be saved. Alice could then have a decent chance of noticing the changes by both Charlie and Bob and then take appropriate actions. This is of course far from perfect since relies on a human noticing the intrusion, but it's still yet another layer of protection.

Access control is actually even more tricky. A backend centered around messaging might apply the sender's security context when delivering messages, but the same doesn't really make sense for backends centered around storage, and especially not when a new node replays old messages to catch up. If I request a list of old messages from e.g. Redis, then the response will have my security context rather than one varying depending on which message I'm looking at from the received list.

This could be compensated for by embedding the authentication in the payload of each message (in a tamper-proof way). This approach would still make it impossible to use snapshots to speed up catch-up for new nodes. Whose authority should be assigned to the snapshot payload?

It seems like access control would have to be applied for entering data into Collaboration Engine rather than when using data received from Collaboration Engine. If an untrusted transport or storage mechanism is used, then cryptography must be used to ensure integrity around the boundaries of the untrusted system, similarly to how TLS is used to bridge over untrusted networks.

The service layer would thus be involved when Alice makes her own edits. The rouge message from Bob would be rejected based on cryptography in the backend integration and never even reach the service layer.

Per-edit encryption may be feasible and not introduce much overhead if it's performed on the client-side, but per-edit decryption on the server-side is likely to result in unacceptable CPU load. If the list of topics on a given transportation technology can be secured, then it's possible that generating a topic ID that is, say, the SHA-256 digest of the document in its original state could provide adequate protection against this sort of attack.

For instance:

class Document {
      private String text = "Hello world!"
}

Results in /topic/29FF147B43F0217B711E8EE3BF12508BFB8638DB3242F78E211EDD2E29CDB64F Adding a salt would prevent attacks if any of the document's historical states were known.

If attackers are unable to retrieve a list of topics and the client-server channel is encrypted, then attackers would need access to the transportation technology's physical network in order to mount an attack--"guessing" or computing this salted, securely-hashed value is infeasible. Additionally encrypting or signing the payload, (I believe) would not be required.

However, this solution does have a downside in that it adds quite a bit of overhead (SHA256 adds 256 bits/message) and would need to be included in each message. Truncating a SHA-256 (or other cryptographically-secure hash) to a smaller subset should preserve good security (entropy) characteristics. Using a prefix or suffix tree to map a truncated hash topic address to its actual value could provide for a fast and memory-efficient way to select the smallest secure topic ID. For example:

ID: 29FF147B43F0217B711E8EE3BF12508BFB8638DB3242F78E211EDD2E29CDB64F Select postfix: E29CDB64F--topic ID could fit into a 32-bit integer. It may be possible to encode editor identity in the topic-ID securely as well--or you could XOR a SHA-256 salted hash of the user ID into the hash of the document to generate a user-specific topic ID. Messages received could then retrieve the user-ID via XOR'ing the user-specific topic ID with the user-ID to produce the canonical topic ID, and use the embedded user portion to populate and verify user identity and access controls.

Variation

Another scheme which may provide superior protection (or at least equivalent) would be to send the user's computed hash to their client when the session is established, so that the user's security context is:

Their native security context (JWT, etc.)
Their document-topic suffix taken from the XOR of the document's salted hash (DID) with the salted hash of their identity (UID)
Their user-topic suffix (suffix(UID))

Upon sending their message with (suffix(UID), suffix(DID)) the system locates the full UID, then looks up the user-specific document ID (USDID) via XOR(UID, USDID) to retrieve the canonical topic ID (DID). Inspecting client-server traffic could never reveal the canonical topic ID, which would be challenging to correlate even if the transport technology's network were compromised.

Conclusion

In both schemes, the topic should be infeasible to locate if listing actual physical topics is restricted. Given that the user's identity can be obtained within the system, it should be possible to populate the correct application security context, allowing application security policies to be retained.

I don't think it's feasible to base anything on a hash of document contents. There isn't always a 1:1 between documents and Collaboration Engine topics - one trivial example of this is a topic used only for an avatar group. There can also be intermediate saves to the underlying document which would mean that new collaborators trying to join would end up with different coordinates.

I don't think encryption would cause too much overhead. Collaboration in this context is always centered around multiple UIs displaying something based on the shared data. Servers are not subscribing to changes unless they have at least one websocket open to a client that expects updates, and that websocket should also be encrypted and thus have the same order of magnitude of overhead per message. In case of Flow, the server would also end up running a non-trivial amount of UI logic to create updated rendering instructions for the client. In the case of Fusion, it wouldn't even be necessary for the server to unpack the change message unless some custom access control or such is used.

We might on the other hand not need to encrypt anything if we only try to protect against injected messages. One step down from full encryption would be to only sign the message with a HMAC. If we instead think it's sufficient to prevent guessing backend topic names, then we can compute the name with a KDF of the original name and a shared secret.

it should be possible to populate the correct application security context

This becomes meaningless if the system is also using snapshots to speed up catch-up since the security context of the snapshot would be a combination of all security contexts that contributed to values in that snapshot.

Not necessarily--a document is {known state} + {list of changes since last known state}--being able to describe a document as a replayable, commutative sequence of commands is a desirable property of CRDTs, although my understanding is that Vaadin is not pursuing that and may lose that desirable property. My observation of Figma's multiplayer technology is that it's be more suitable for the fewer and larger operations that are present in visual design systems than the rapid, small operations that textual systems require. They describe it thus:

I think that the end solution must handle both cases, which is admittedly a hard challenge.

Encrypting the client-server channel is already assumed for any base-level security and in that context provides little benefit for preventing unintended edits within the system: encryption is a high price to pay for single-word or character edits happening potentially many thousands of times per minute, but I acknowledge that there may be good benefits to the approach. However, the message validation provided by a MAC-type scheme is relatively unimportant here as I see the problem--I think that we can assume that if a user can access a topic, then the messages that they send will be trusted and intact: users who can edit a document can certainly corrupt it as they see fit, thus the challenges are:

Preventing unauthorized access to the document
Preventing unauthorized access to the topic that can modify the document and delegate actual modifications to (1)

Given that there are robust, existing solutions to (1), CE must only take responsibility for (2) with the additional caveat that it must not circumvent (1). MAC/HMAC solutions do not necessarily prevent pathological messages to be published to the topic, and it's not hard to imagine unvalidatable messages published to the topic that can cause a constellation of problems (not the least of which is dropping all topics/accessing authentication caches/etc, which could be a problem with at least Redis, for instance.) Encoding the relevant encryption within the actual transportation layer has manifest performance, security, and simplicity benefits if done correctly.

It's also not the case that replaying messages will corrupt the associated security contexts: each security context must be effectively applied per edit command. It may be possible to coalesce various operations on the basis of their associated contexts, but that does require operations to commute.

The granularity difference between text editing and most other UI actions does indeed make the tradeoffs more interesting. At the same time, I'm starting to get the feeling that it would be useful to zoom out a little and have a look at different use cases and scenarios. That topic is, however, much broader than this proposed way of using existing shared infrastructure as a backend to support coordination between multiple application server nodes. I plan to write up some kind of architectural summary based on the latest ideas and open that one up for discussion.

To still briefly look at question of backend integration and access control, I would say that we have identified a wide range of possibilities specifically for the multitentant scenario where an attacker might inject rouge messages into the system. It feels like the best option might indeed be some scheme for keeping the underlying topic ids secret. The key question is still whether the division of responsibilities laid out by the proposed API would be flexible enough for a backend integration implementation to be configured for trade-offs that are appropriate for a given situation?

There are also other threat models than the discussed multitenant case. One case in particular that might be important for us is when the backend is hosted by a third party. In that case, the user would have to put some trust on the availability of the service, but the might want to employ end-to-end encryption so that they wouldn't have to trust the vendor when it comes to confidentiality or integrity. This is a quite different situation compared to reusing existing off-the-shelf shared infrastructure since it's much more practical to have custom logic on the receiving end in the backend.

An architecture overview is now published as a discussion topic at https://github.com/vaadin/collaboration-engine/discussions/49.

vaadin / collaboration-engine

Support arbitrary cluster backends #48

Security

Alternate Solution 1:

Variation

Conclusion