imnotteixeira / dissertation

0 stars 0 forks source link

[Paper] Concurrency control in groupware systems #61

Closed imnotteixeira closed 3 years ago

imnotteixeira commented 3 years ago

https://dl.acm.org/doi/10.1145/67544.66963

Groupware systems are computer-based systems that support two or more users engaged in a common task, and that provide an interface to a shared environment. These systems frequently require fine-granularity sharing of data and fast response times. This paper distinguishes real-time groupware systems from other multi-user systems and discusses their concurrency control requirements. An algorithm for concurrency control in real-time groupware systems is then presented. The advantages of this algorithm are its simplicity of use and its responsiveness: users can operate directly on the data without obtaining locks. The algorithm must know some semantics of the operations. However the algorithm's overall structure is independent of the semantic information, allowing the algorithm to be adapted to many situations. An example application of the algorithm to group text editing is given, along with a sketch of its proof of correctness in this particular case. We note that the behavior desired in many of these systems is non-serializable.

imnotteixeira commented 3 years ago

See Also:

A groupware system is defined as multi-user (2 or more) computer systems, that allow development on a common task, providing an interface to a shared environment. Ellis and Gibbs (TODO cite) present an algorithm to solve the groupware real-time concurrency problem called Operational Transformation (OT) which allows concurrent editing without the need for locks, increasing responsiveness.

Response time is defined as the time required for the user's action to be reflected on their screen. Notification Time is defined as the time required for the user's action to be propagated to all other participants.

Real-time groupware systems have the following characteristics:

Groupware system model: Formed by a set of sites and operators. Sites consist of a site process (i.e. a user's unique session), a site object (i.e. the data being read and modified), and a unique site identifier. Operators are the set of operations available for users to apply to the site objects. The goal is to maintain consistency among all the site objects at all times.

The site process performs three kinds of activities: operation generation, where the user generates an operation to be applied to the site objects. The site will then encapsulate the action in an operation request to be broadcasted to all other sites; operation reception, where an operation is received from another site; operation execution, where an operation is executed on the local site object.

The model further assumes that the number of sites is constant, messages are received exactly once, without error, and that it is impossible to execute an action before it is generated.

The paper further specifies the following definitions regarding the groupware system:

imnotteixeira commented 3 years ago

The algorithm uses five auxiliary data structures: State vector, Request, Request Queue, Request Log and Transformation Matrix

State Vectors are based on the partitial ordering definition in (cite [Lamp781 Lamport, L. Time, Clocks, and the Ordering of Events in a Distributed System,) and the concept of vector clocks in (cite Barbara Liskov, Rivka Ladin (1986). "Highly-Available Distributed Services and Fault-Tolerant Distributed Garbage Collection") and (cite Colin J. Fidge (February 1988). "Timestamps in Message-Passing Systems That Preserve the Partial Ordering"), stores the amount of operations done per site, i.e. the i'th component of the vector represents how many operations from site I have been executed in the current site. It is therefore possible to compare two state vectors s_i and s_j MAKE SURE I AND J ARE INDEXES:

Requests are tuples in the form <i,s,o,p> where i is the originating site's identifier, s the origintain site's state vector, o is the operation and p is the priority associated with o. From the request state vector, a site can determine if the operation to execute can be executed immediately, or wait for needed updates from other sites, enforcing the precedence property.

The request queue is a list of requests pending execution. Even thought the term "queue" is used, it does not imply first-in-first-out order.

Request Log stores at site i, the executed requests at that site, in insertion order.

The Transformation Matrix defines for every operation type pair a function T, that transform operations so that given two operations o_i and o_j, with priorities p_i and p_j, instances of operators O_u and O_v, respectively and eq(o'j = T{uv}(o_j, o_i, p_j, p_i)) eq(o'i = T{vu}(o_i, o_j, p_i, p_j)) then T is such that o'_j => o_i = o'_i => o_j, => meaning composition of operations

The algorithm has an initialization section, a generation section, a receive section, and an execution section. In the initialization section, the site's log and request queue are set to empty, and the state vector is initialized with all values being 0, since no operations have been done. The next section specifies that whenever a local operation is received, a request is formed and it is added to the local queue and broadcasted to other sites. In the receive section, when a request is received, it is simply added to the request queue. Finally, the execution section specifies how to apply the operations, handling conflicts. First, it checks the request queue to retrieve any request (with state s_j) that can be executed, s_i being the state in the local site i and there are three possibilities:

  1. s_j > s_i: The request cannot be executed since there are changes done in site j that were not executed yet at site i, therefore the request must be left in the queue for later execution;
  2. s_j = s_i: The two states are equal, therefore the request can be executed immediately without operation transformation
  3. s_j < s_i: The request can be executed, but the operation must be transformed, since site i has executed requests that are preceded by request j, r_j. Site i's log L_i is examined for requests that were not accounted for by site j (i.e. the requests that were executed in i but not on j prior to the generation of r_j. Each such request is then used to transform o_j in o'_j, according to the Transformation Matrix. o'_j is then executed and the state vector is incremented.

(cite Dynamic Vector Clocks for Consistent Ordering of Events in Dynamic Distributed Applications) and (cite Almeida, Paulo; Baquero, Carlos; Fonte, Victor (2008), "Interval Tree Clocks: A Logical Clock for Dynamic Systems") propose some changes to the state vector technique to allow dynamic entries, instead of a constant number of concurrent participants. (cite ellis and coiso - current paper) address this issue by noting that participants can enter and leave every time the system is quiescent, since in this case the Request Logs can be reset and it should function like a checkpoint on each site.

imnotteixeira commented 3 years ago

Compare with #62 and also with CRDTs (https://www.infoq.com/presentations/crdt-distributed-consistency/ and #63

imnotteixeira commented 3 years ago

(cite High-Latency, Low-Bandwidth Windowingin the Jupiter Collaboration System) builds another algorithm on top of the existing OT, presented in (cite ellis) that uses a server mamanging the collaboration, instead of being peer-to-peer like the former. This reduces the need for the request priority fields in the requests for tie-breaking, since the server can use a different strategy such as a reputation system, which the next section will develop upon. By removing the need for multicasting, since the server orchestrates the process and the communication is done in server-client pairs only, there is no need for message reordering logic, since a message transport protocol such as TCP (cite tcp) can be used instead, ensuring message delivery in the correct order before reaching the application layer, reducing the clients' workload.

imnotteixeira commented 3 years ago

Mention ShareDB as an implementation of OT for Node.js