tradle / why-hypercore

Exploration of Hypercore's breakthrough designs and capabilities, uncovering its gems that may be scattered across different github accounts (official and community-led), and learning to think from the "first principles" of P2P, while using the best Cloud, AI and blockchain have to offer.
MIT License
81 stars 7 forks source link

transacting in-time #9

Open urbien opened 3 years ago

urbien commented 3 years ago

Purpose

Distributed transactions need time to control ordering across multiple peers. In other words, time is needed to achieve the same state of a particular set of objects on each peer and to take a consistent state snapshot across peers.

Context

Hypercore does not have a notion of distributed time. Each hypercore keeps its own logical time, as an index / seq / version in the hypercore append-only log. It works great for single-writer, where other peers just get the same logical time in a given hypercore. But once we allow other peers to write into the same hypercore, this time breaks down (in reality we never write into the same hypercore, we create a parallel in-sync hypercore, and together they logically form one hypercore, as you see in multi-hyperbee. And this logical hypercore, otherwise called multi-writer (see #7) needs its own logical time.

Distributed Clocks

A number of distributed clocks were created since the 1970s:

  1. Lamport clock. Lamport defined the first causal clock, aimed to create a single transaction sequence across multiple peers. In this algorithm each transaction (tx) carries a number. Before sending tx to other peers (or locally ?), peer would take a maximum tx number received from peers, increment it and send.

  2. Vector clock. Same but storing the maximum tx number seen from each peer in a vector [1:3, 2:5, 3:2]. Vector clocks allow to find a state for consistent snapshot more often. But they grow with the number of peers sharing state and require assigning a unique ID to peers, and manage peers joining and leaving. To prevent vector clock from growing garbage collection for peers that left needs to be used. Vector clock is ok for small clusters.

  3. Bloom clock. TBD.

  4. HLC. HLC is available since 2014. At the time when Lamport invented his clock, time was not well managed. Now we gave decent time synchronization with NTP and the drift, when it occurs, is fairly quickly corrected. Hybrid logical clock takes advantage of this, using normal clocks and adding Lamport's logical time. The genius of HLC is that it is self-healing, it quickly drops the logical time, resetting back to normal time. Another genius property of HLC is that it is compact, it uses 64-bit format of NTP (re-purposing 16 bits in NTP time).

HLC analysis

Applying to Hypercore

Hypertrie and Hyperbee allow us to use JSON as a value for the key. Here is what we propose to add to every JSON (we do not attempt to do this for Hypercore log as it is has no notion of edits, only appends. Hyperdrive uses Hypertrie, so same approach will apply, with the exception of diff object as it could be quite big for edited files).

  1. HLC timestamp
  2. previous seq. If JSON was edited, this is the seq / index / version of this hypercore where the previous version of this JSON sits.
  3. diff seq. Reference to the diff object in this hypercore for the above change. The diff object provides CRDT accounting.

Here is how diff looks in automerge (we will have to condense it somehow):

// original:   { a: 1, b: 2, c: 3, d: [ 3, 7, 9 ] }
// updated: { a: 1, b: 2, c: 4, d: [ 3, 7, 9, 5 ], f: 10 }

// DIFF
[
  {
    "ops": [
      {
        "action": "set",
        "obj": "00000000-0000-0000-0000-000000000000",
        "key": "c",
        "value": 4
      },
      {
        "action": "set",
        "obj": "00000000-0000-0000-0000-000000000000",
        "key": "f",
        "value": 10
      },
      {
        "action": "ins",
        "obj": "54ab5576-cfda-4781-aa7e-9399dc12ad7b",
        "key": "a1:3",
        "elem": 4
      },
      {
        "action": "set",
        "obj": "54ab5576-cfda-4781-aa7e-9399dc12ad7b",
        "key": "a1:4",
        "value": 5
      }
    ],
    "actor": "a1",
    "seq": 2,
    "deps": {}
  }
]