ipfs / specs

Technical specifications for the IPFS protocol stack
https://specs.ipfs.tech
1.17k stars 232 forks source link

Record Validity System Notes #4

Open jbenet opened 9 years ago

jbenet commented 9 years ago

Record Validity Systems

This is a discussion on why IPFS supports pluggable validity schemes. (Warning: very rough draft!)

(Note: the application of these records are things like routing systems (dht records) or name systems (dns /ipns records))

Different applications call for different validity schemes, depending on hard constraints. Some examples are discussed below. Since IPFS seeks to support a wide variety of use cases (e.g. fast, in-datacenter replication; slow, trusted blockchain-consensus replication; dns; ...) we must allow users to pick their own record validity schemes. Since these records are authored by a user, and record consumers have some relation to the author, it is reasonable to expect that agreement upon which validity scheme to use is pre-agreed upon.

(It is worth exploring ways to express the validity verification computation generically, so that new validity schemes can be used without requiring a code update to older clients. This is out of scope for this discussion.)

So far, this Validity System record flexibility has been used mainly for IPNS, but it is perhaps the case that all routing records should be equally flexible.

Vector Clocks or Log Time or Blockchain Time

Distributed systems are often organized with discrete logical events, or epoch intervals. These are correlated with physical time, but not dependent on the problem of measuring and agreeing on physical time, as that would require distributed consensus itself. These discrete time steps can be either relative (as in lamport's vector clocks), or pseudo-global (as in blockchain time). It depends largely on the use case constraints (latency / synchrony tradeoffs).

Regardless, these systems share in common that there is an append-only log of events, where each event represents a different "time epoch" and a timestamp can be made to refer to this epoch. Tecords may be given validity in terms of these logical clocks, in terms

[ 1 ]<---[ 2 ]<---[ 3 ]<---[ 4 ]<---[ 5 ]

Examples:

These validity constraints can even be expressed in terms of variables, where the variables are decided when the record actually enters into effect. An example of this is: a bitcoin transaction created and signed at time T1, enters blockchain at time T2 and is valid from T2+1000 to T2+1000+10.

(Good old) NTP Timestamps

Fast applications where clock skew is non-criticial, security is not paramount (or where trust is fully delegated), may opt to use NTP Timestamps. For example, most humans operate on NTP Timestamps. This works well in practice.

((There are cases -- e.g. multi-party distributed consensus protocols -- where a "physical timestamp" is not available, not precise enough (skew), not trusted, or not in the application model. Thus systems opt for logical timestamps of other sorts. discussed above))

In practice these systems function like the generalized vector clocks, but the timestamps are derived from some physical measurement, and meant to convey meaning outside of the system itself. (hence the special distinction)

Examples:

A slightly different way to use discrete, incrementing time is to reference the constraint as a time to live. I describe this separately because it is a substantially different use case, and the constraint expressed is necessarily relative. The record enters operation or "becomes valid" under well-defined conditions (e.g. a record is served, media begins to play, a transaction enters into effect).

It is worth noting that TTL validity is expressed relatively, and thus it can be signed without exacty knowledge of the time at which it will enter validity.

Examples:

Another validity scheme for append-only log datastructures is to use the "last" log entry as the most valid. This is divorced from a concept of regular intervals of time, and depends instead on state distribution, meaning that the state of the log needs to be replicated to the program deciding upon the validity of a record. When built as signed merkle-dags (logs using cryptographic hashes to reference previous entries, and digital signatures for authorship), the records may be sent over any (untrusted, asynchronous) channel.

It is worth noting that this scheme suffers from an important attack vector: an attacker may not be able to forge records, but may be able to serve old records, and supress communication of new records. A MITM can then ensure victims perceive an outdated version of reality (invalid as far as the record author is concerned). In git, this is preserving exploits, silencing bugfixes; in bitcoin, this is serving alternate or outdated (as far as work-majority is concerned) blockchain heads.

Examples:

jbenet commented 9 years ago

@whyrusleeping this is a preamble to the discussion at hand: where + how should we account for validity in routing. will follow that up in another discussion, as it's different than this one.

jbenet commented 9 years ago

@amiller this (expressing validity constraint verifiers) may be an interesting application of your ADG stuff. I worry about allowing arbitrary computation to be shipped around and executed, but it may be possible to have a simple language expressing pure functions that refer only to ipfs objects and fields in them. Could be non-turing complete or cycle bounded.