[Paper] Concurrency control in groupware systems

imnotteixeira commented 3 years ago

https://dl.acm.org/doi/10.1145/67544.66963

Groupware systems are computer-based systems that support two or more users engaged in a common task, and that provide an interface to a shared environment. These systems frequently require fine-granularity sharing of data and fast response times. This paper distinguishes real-time groupware systems from other multi-user systems and discusses their concurrency control requirements. An algorithm for concurrency control in real-time groupware systems is then presented. The advantages of this algorithm are its simplicity of use and its responsiveness: users can operate directly on the data without obtaining locks. The algorithm must know some semantics of the operations. However the algorithm's overall structure is independent of the semantic information, allowing the algorithm to be adapted to many situations. An example application of the algorithm to group text editing is given, along with a sketch of its proof of correctness in this particular case. We note that the behavior desired in many of these systems is non-serializable.

imnotteixeira commented 3 years ago

See Also:

A groupware system is defined as multi-user (2 or more) computer systems, that allow development on a common task, providing an interface to a shared environment. Ellis and Gibbs (TODO cite) present an algorithm to solve the groupware real-time concurrency problem called Operational Transformation (OT) which allows concurrent editing without the need for locks, increasing responsiveness.

Response time is defined as the time required for the user's action to be reflected on their screen. Notification Time is defined as the time required for the user's action to be propagated to all other participants.

Real-time groupware systems have the following characteristics:

Highly interactive: Short response times
Real-time: Notification times should be close to the response times
Distributed: They should work even if the participants are connected in different machines and networks on the internet
Volatile: Participants may enter and leave the session at any time
Ad Hoc: Participants don't follow a script, it is therefore impossible to know what is the information they are trying to access beforehand
Focused: Generally users will be trying to access the same data, generating a high degree of access conflicts
External Channel: Participants are often connected among them via an external channel such as an audio or video communication tool

Groupware system model: Formed by a set of sites and operators. Sites consist of a site process (i.e. a user's unique session), a site object (i.e. the data being read and modified), and a unique site identifier. Operators are the set of operations available for users to apply to the site objects. The goal is to maintain consistency among all the site objects at all times.

The site process performs three kinds of activities: operation generation, where the user generates an operation to be applied to the site objects. The site will then encapsulate the action in an operation request to be broadcasted to all other sites; operation reception, where an operation is received from another site; operation execution, where an operation is executed on the local site object.

The model further assumes that the number of sites is constant, messages are received exactly once, without error, and that it is impossible to execute an action before it is generated.

The paper further specifies the following definitions regarding the groupware system:

Given two operations a and b, generated at sites 1 and 2, respectively, a precedes b iff:
- a = b and the generation of a happened before the generation of b, or
- a != b and the execution of a on site 2 happened before the generation of b
The Precedence Property states that if an operation a precedes another operation b, then at every site the execution o a happens before the execution of b
A groupware session is quiescent iff all generated operations have been executed at all sites
The Convergent Property states that site objects are identical at all sites at quiescence
A groupware system is correct iff the Convergence Property and the Precedence Property are always satisfied

imnotteixeira commented 3 years ago

The algorithm uses five auxiliary data structures: State vector, Request, Request Queue, Request Log and Transformation Matrix

State Vectors are based on the partitial ordering definition in (cite [Lamp781 Lamport, L. Time, Clocks, and the Ordering of Events in a Distributed System,) and the concept of vector clocks in (cite Barbara Liskov, Rivka Ladin (1986). "Highly-Available Distributed Services and Fault-Tolerant Distributed Garbage Collection") and (cite Colin J. Fidge (February 1988). "Timestamps in Message-Passing Systems That Preserve the Partial Ordering"), stores the amount of operations done per site, i.e. the i'th component of the vector represents how many operations from site I have been executed in the current site. It is therefore possible to compare two state vectors s_i and s_j MAKE SURE I AND J ARE INDEXES:

s_i = s_j -- if each component of s_i is equal to the corresponding component in s_j
s_i < s_j if each component of s_i is less than equal to the corresponding component in s_j and at least one component of s_i is less than the corresponding component in s_j
s-I > s_j if at least one component of s_i is greater than the corresponding component in s_j

Requests are tuples in the form <i,s,o,p> where i is the originating site's identifier, s the origintain site's state vector, o is the operation and p is the priority associated with o. From the request state vector, a site can determine if the operation to execute can be executed immediately, or wait for needed updates from other sites, enforcing the precedence property.

The request queue is a list of requests pending execution. Even thought the term "queue" is used, it does not imply first-in-first-out order.

Request Log stores at site i, the executed requests at that site, in insertion order.

The Transformation Matrix defines for every operation type pair a function T, that transform operations so that given two operations o_i and o_j, with priorities p_i and p_j, instances of operators O_u and O_v, respectively and eq(o'j = T{uv}(o_j, o_i, p_j, p_i)) eq(o'i = T{vu}(o_i, o_j, p_i, p_j)) then T is such that o'_j => o_i = o'_i => o_j, => meaning composition of operations

The algorithm has an initialization section, a generation section, a receive section, and an execution section. In the initialization section, the site's log and request queue are set to empty, and the state vector is initialized with all values being 0, since no operations have been done. The next section specifies that whenever a local operation is received, a request is formed and it is added to the local queue and broadcasted to other sites. In the receive section, when a request is received, it is simply added to the request queue. Finally, the execution section specifies how to apply the operations, handling conflicts. First, it checks the request queue to retrieve any request (with state s_j) that can be executed, s_i being the state in the local site i and there are three possibilities:

s_j > s_i: The request cannot be executed since there are changes done in site j that were not executed yet at site i, therefore the request must be left in the queue for later execution;
s_j = s_i: The two states are equal, therefore the request can be executed immediately without operation transformation
s_j < s_i: The request can be executed, but the operation must be transformed, since site i has executed requests that are preceded by request j, r_j. Site i's log L_i is examined for requests that were not accounted for by site j (i.e. the requests that were executed in i but not on j prior to the generation of r_j. Each such request is then used to transform o_j in o'_j, according to the Transformation Matrix. o'_j is then executed and the state vector is incremented.

(cite Dynamic Vector Clocks for Consistent Ordering of Events in Dynamic Distributed Applications) and (cite Almeida, Paulo; Baquero, Carlos; Fonte, Victor (2008), "Interval Tree Clocks: A Logical Clock for Dynamic Systems") propose some changes to the state vector technique to allow dynamic entries, instead of a constant number of concurrent participants. (cite ellis and coiso - current paper) address this issue by noting that participants can enter and leave every time the system is quiescent, since in this case the Request Logs can be reset and it should function like a checkpoint on each site.

imnotteixeira commented 3 years ago

Compare with #62 and also with CRDTs (https://www.infoq.com/presentations/crdt-distributed-consistency/ and #63

imnotteixeira commented 3 years ago

(cite High-Latency, Low-Bandwidth Windowingin the Jupiter Collaboration System) builds another algorithm on top of the existing OT, presented in (cite ellis) that uses a server mamanging the collaboration, instead of being peer-to-peer like the former. This reduces the need for the request priority fields in the requests for tie-breaking, since the server can use a different strategy such as a reputation system, which the next section will develop upon. By removing the need for multicasting, since the server orchestrates the process and the communication is done in server-client pairs only, there is no need for message reordering logic, since a message transport protocol such as TCP (cite tcp) can be used instead, ensuring message delivery in the correct order before reaching the application layer, reducing the clients' workload.

imnotteixeira commented 3 years ago

Mention ShareDB as an implementation of OT for Node.js

imnotteixeira / dissertation

[Paper] Concurrency control in groupware systems #61