ipvm-wg / spec

High Level IPVM Spec
Other
51 stars 4 forks source link

Scheduler Guarantees #12

Open expede opened 1 year ago

expede commented 1 year ago

In the state of #8 at time of writing, tasks are classified as ipvm/wasm and ipvm/effect. This is almost certainly wrong.

From chatting with @lukemarsden & @simonwo earlier, what we probably actually care about is:

  1. Signalling the kind of thing to be run (Wasm, Docker, HTTP, etc)
  2. Under which assumptions (e.g. deterministic Wasm subset, has direct disk access, etc)
  3. Scheduler guarantees (can be safely retried, needs oracle attestation, needs a job lock, must be reproducible for verification, etc)

I think that it's possible to do this by classification rather than writing a config file that could be complex and self-contradictory (fully deterministic and direct disk access).

The easy one is a delineation between pure computation and anything stateful. Docker falls into the stateful bucket itself, so we cannot isolate its effects, and thus oracle attestation is the level of reproducibility (low). But Bacalhau is "safe" to run in the sense that it doesn't produce destructive effects (it's "nondestructive" in the current WIP classification). It does depend on the external world for randomness and time and so on, but you could "safely" schedule these in sequence or parallel without breaking that contract.

expede commented 1 year ago

Actually, the more that I think about this, the more that the current task spec's classification of deterministic/nondestructive/destructive seems possibly correct from the distributed scheduler's POV 🤔

It's possible that the main problem with the current version is that it privileges ipvm/wasm over all else (almost with an attitude of "the gross nondeterministic stuff that we can't trust over in ipvm/effect"). ipvm/wasm has guarantees beyond Wasm: it's the deterministic subset, with gas accounting, etc. But if this spec intends to be a general framework that we can all align on, it's probably more inviting as something along the lines of:

{
  "type": "ipvm/task",
  "version": "0.1.0",
  "safety": "deterministic",
  "on": "ipvm/wasm",
  "using": "Qm12345", // Wasm module
  "run": {
    "args": [/*...*/]
  }
}
{
  "type": "ipvm/task",
  "version": "0.1.0",
  "safety": "nondestructive",
  "on": "bacalhau/docker",
  "using": "Qm12345", // Docker container
  "run": {
    // "docker": "stuff here",
    "args": [/*...*/]
  }
}

Though the above is of course not BFT, since I can mismatch:

{
  "type": "ipvm/task",
  "version": "0.1.0",
  "safety": "deterministic", // Asserted deterministic, but...
  "on": "web",
  "using": "https://google.com", // ...not deterministic!
  "run": {
    "crud": "get"
  }
}
simonwo commented 1 year ago

I had some thoughts on this here: https://docs.google.com/document/d/1Byo5Daw1q7OxgR-945n3F3yrZuC29UmiNpMdjBv_CBs/edit#heading=h.66mcybtbgf3

Broadly: instead of trying to define a single flag which represents the level of risk (according to some definition), allow users to detail the actual unsafe capabilities they require to run the task. Although now that I've written it out... it sounds a lot like Effects...

simonwo commented 1 year ago

...although that's at odds with what you're writing above about not wanting complex config.

It feels like the type and the safety form a tuple that different networks will support. IPVM will support ("ipvm/wasm", "desterministic") and so will Bacalhau, which will also support ("docker", "nondestructive") and maybe one day ("docker", "deterministic").

Hmm, this is going to need to sit and stew for a while.

expede commented 1 year ago

Thanks for the thoughts!!

I apologize for the horrible sketch, but from some noodling yesterday:

IMG_0015

...same as...

              | Destructive | Verifiable | Idempotent |
              |-------------|------------|------------|
Pure Function | No          | Yes        | Yes        |
Resolve CID   | No          | Yes        | No         |
HTTP GET      | No          | No         | No         |
HTTP PUT      | Yes         | No         | Yes        |
Send Email    | Yes         | No         | No         |

(It's technically a trilemma, but I don't think particularly useful to think about in those terms. There's only 5 possible states in practice)

For the distributed scheduler, we only care about idempotence and destructivity. In IPVM, either verifiability or attestation (oracle) is required, so we can ignore that for the purposes of orchestration.

This brings us really cleanly into something akin to CQRS (isolate destructive actions) plus atomics.

https://github.com/ipvm-wg/spec/blob/31b1142c8b518ca944fb5017fa476cf1194f01b7/task/README.md?plain=1#L39-L77

We can probably infer the idempotence from the ability, such as http/put vs http/post.

I'll have lots of writing about this in the WIP spec shortly!