streamich / json-joy

JSON CRDT, JSON CRDT Patch, JSON Patch+, JSON Predicate, JSON Pointer, JSON Expression, JSON Type
https://jsonjoy.com/libs/json-joy-js
Apache License 2.0
736 stars 14 forks source link

Peritext for JSON CRDT #222

Open streamich opened 1 year ago

streamich commented 1 year ago

Block element implementation is described here.

streamich commented 1 year ago

Peritext proposes multiple new operations:

Consider if less operations can be used.

Possibly both scenarios can be expressed by a single "slice" concept (analogous to Peritext markers, but which link to another node in the document). If the slice starts and ends at the same spot, it degrades to a "marker"? A block splitBlock could be represented by an invisible or deleted character to which a slice is attached (start end end of the slice map to the same character).

streamich commented 1 year ago

Come up with a good name for the "slices":

Come up with a good name for the "payload" of a "slice":

A deletion of a slice could be supported. Deletion is marking the slice with a tombstone. Rather than creating a separate operation for it, the tombstone could be contained in the mark/payload. If payload is set to undefined, that means the slice has been deleted? More complications arise from the type of the mark:

In general, likely there is no necessity to support "deletion", i.e. ability to add tombstones on the RGA level. Instead, maybe the rich-text layer (Peritext) can mark ranges/slices with tombstones, if needed.

Some text annotations are very small, so it might not be beneficial to have the ability to delete them. For example, making text bold could be achieved by a slice, which has a single byte payload (say, some integer, which represents bold formatting)—in that case, deleting the slice will not be much more efficient that adding another overlay slice which reverses the bold formatting. However, it probably still would be useful to make the RGA level aware of the deleted slices, so, once a slice is deleted, it does not reveal it anymore to the higher level.

Represent the slices as a Grow-only-Set of the following tuples:

Text annotations set:
slice1 = (id1, start1, end1, value1)
slice2 = (id2, start2, end2, value2)
...

Where each start and end element is composed of the range boundary ID as well as a flag, which specifies if the if the edge is inclusive or exclusive:

start = (id, isInclusive)
streamich commented 1 year ago

We also need to be able to represent the virtual text start and end elements.

streamich commented 1 year ago

Taxonomy:

Edge = (CharacterId, IsInclusive)
Interval = (StartEdge, EndEdge)
Slice = (SliceId, Interval, Value)

Marker (e.g. a splitBlock boundary) is when interval is empty (collapsed), e.g. StartEdge = EndEdge:

Marker = (SliceId, Interval = StartEdge, Value)

The contents of the element of the marker can be anything.

streamich commented 1 year ago

Interval consists of two "boundaries":

Boundaries are linked to an:

One way to represent an edge is by a 2-tuple [element: ID, isInclusive: boolean], however then isInclusive does not describe well whether it is the "before" or "after" anchor, as defined in Peritext. Basically a boolean is not sufficient to describe the anchor point, if the boundary is used standalone (without the context of the interval).

image

Hence, it is better to call "boundaries" as "endpoints", and define them as:

enum Anchor { Before, After }
type Endpoint = [element: ID, anchor: Anchor];
streamich commented 1 year ago

Taxonomy 2.0:

enum Anchor { Before, After }
type Endpoint = [element: ID, anchor: Anchor];
type Interval = [start: Endpoint, end: Endpoint];
type Slice = [id: ID, interval: Interval, value: ID];
type SliceSet = Slice[];

How to represent the "start" and "end of the RGA sequence?

const startEndpoint = [strNodeId, Anchor.After];
const endEndpoint = [strNodeId, Anchor.Before];

alternatively

const startEndpoint = [undefinedId, Anchor.After];
const endEndpoint = [undefinedId, Anchor.Before];
streamich commented 1 year ago

Efficient rich text representation:

type LocalSlice = [id: ID, start: number, length: number, value: unknown];

interface Peritext {
  text: Rope;
  slices: LocalSlice[];
}

interface Rope {
  str(start: number, length: number): string;
  interval(start: number, length: number): Rope;
}
streamich commented 1 year ago

Taxonomy 3.0:

enum Anchor { Before, After }
type Endpoint = [element: ID, anchor: Anchor];
type Interval = [start: Endpoint, end: Endpoint];
type Slice = [id: ID, interval: Interval, tag?: ID];
type SliceSet = Slice[];
streamich commented 1 year ago

Taxonomy 4.0:

Local operation:

enum Anchor { Before, After }
type Endpoint = [index: number, anchor?: Anchor]; // -1 represents the string root element.
type Interval = [start: Endpoint, end?: Endpoint];
type Slice = [interval: Interval, tag?: ID];
type SliceSet = Slice[];

Remote operation:

enum Anchor { Before, After }
type Endpoint = [element: ID, anchor: Anchor];
type Interval = [start: Endpoint, end: Endpoint];
type Slice = [id: ID, interval: Interval, tag?: ID];
type SliceSet = Slice[];