near / nearcore

Reference client for NEAR Protocol
https://near.org
GNU General Public License v3.0
2.32k stars 619 forks source link

Yield Execution (NEP 516 / NEP 519) #10455

Open saketh-are opened 9 months ago

saketh-are commented 9 months ago

NEPs: near/NEPs#516 near/NEPs#519

The following branches contain a basic prototype of yield execution supporting the chain signatures use case:

To test out the chain signatures contract:

  1. Build neard and run localnet.
  2. Build mpc_contract from near-sdk-rs/examples.
  3. Create a new account mpc.node0 on localnet: env NEAR_ENV=localnet near create-account "mpc.node0" --keyPath ~/.near/localnet/node0/validator_key.json --masterAccount node0.
  4. Additionally create accountsrequester.node0 and signer.node0.
  5. Deploy the contract code: env NEAR_ENV=localnet near deploy "mpc.node0" <path/to/mpc_contract.wasm>
  6. Submit a signature request env NEAR_ENV=localnet near call mpc.node0 sign '{"payload" : "foo"}' --accountId requester.node0. Observe that the request will hang.
  7. From a separate terminal, use env NEAR_ENV=localnet near call mpc.node0 log_pending_requests --accountId signer.node0 to see the data id for the pending request. In a real use case, the signer node will monitor the contract via indexers to get this information.
  8. Submit a "signature" via env NEAR_ENV=localnet near call mpc.node0 sign_respond '{"data_id":"<data id here>","signature": "sig_of_foo"}' --accountId signer.node0
  9. In the original terminal, observe that the signature request returns with "sig_of_foo_post".

Note that steps 6-8 are a bit time-sensitive at the moment. If the call to sign in step 6 doesn't receive a response from step 8 within roughly a minute, you'll eventually see a message Retrying transaction due to expired block hash.

Remaining work includes:

### Tasks
- [x] Implement timeouts
- [x] Ensure that the caller for `promise_data_yield` is the only one that can call `promise_submit_data`
- [x] Ensure that `promise_submit_data` cannot be used with data ids not created by `promise_data_yield`
- [x] Ensure the case where `yield_resume` is called within the same transaction as `yield_create` works correctly
- [x] Rework the implementation on the nearcore side to avoid creating a new Action type
- [x] `gas` argument for `promise_yield_create` should be accompanied with the `gas_weight` argument
- [x] Simplify trie state to just yield queue plus postponed action receipts
- [ ] Ensure gas costs are charged properly, including a new cost parameter for postponing a receipt
- [x] Implement resharding logic for yielded promise queue
- [x] Determine an acceptable value for the timeout length
- [ ] Tests: feature gating works as expected
- [x] Tests: functionality works as expected (i.e. see various "ensure" steps above – all of these should be tested)
- [ ] Check if profile entries related to yield/resume need to be excluded from the profiles while this functionality ain't stable
- [x] The error should be the same for all different failure modes of `yield_submit_data_receipt`
- [x] Look into "Retrying transaction due to expired block hash"
- [ ] Look into integration with the higher-level Promises API in near-sdk-rs
walnut-the-cat commented 9 months ago

can we link PRs for listed tasks to this tracking issue?

saketh-are commented 9 months ago

I have a couple of draft PRs I'm continuing to iterate on:

I don't anticipate having separate PRs for each subtask since it won't make sense to merge this until we have all the details right.

nagisa commented 9 months ago

Here some thoughts I had over today as I was thinking about estimation and costs for this feature.

  1. The hard part is really just paying for the storage. The compute costs for handling these new operations seem straightforward and can be largely replicated from the code for the already implemented estimations.
  2. promise_await_data seems quite straightforward in isolation too;
    • Except when used in combination with promise_and: you can join a bunch of promise_data_await dependencies together and some of the backing structures will have to live for as long as the longest promise_data_await! If implemented naively, this combination can result in surprising amount of gas charged for the amount of work done.
  3. Data is written to rocksdb as a result of promise_submit_data but that will most likely happen deep inside the transaction runtime outskirts where gas tracking is no longer conducted, so it is necessary to account for this at the same time base action cost is charged.
    • cost(promise_submit_data) = action_base + storage_base + storage_per_byte?
  4. This data is later read out at least multiple times over before the continuation for promise_then is invoked.
    • As far as I understand the value can end up being read number of times in proportion to how long the pending continuation action’s inputs remain unresolved;
    • It will still be read multiple times (but possibly a fewer number) even if the the dependencies are immediately resolved;
    • But more importantly there seems to be no way to know ahead of time just how many reads of this data can occur. It doesn’t help to even if we could guarantee that receipt inputs are read out once every block production cycle. That’s because users can arbitrarily delay the resolution of the promise in the proposed design.
    • It sounds like it'd be really hard to estimate this accurately and we should replace these reads outs with a presence check (which is what they do) in the first place, which would make this problem much less severe and cost estimation easier to reason about.
      • I’ll try hacking on it, maybe I’ll learn something new along the way…
    • Regardless, promise_submit_data actually sounds like the most appropriate place to charge gas for storage of the data being submitted as this is the first point in the logic flow at which both the number of blocks to store the data for and the amount of data being stored is known.
    • The costs here should be appreciable, so that there isn’t an economic incentive to use this as a mechanism for data storage.
nagisa commented 8 months ago

I have addressed the concern of the data being read out multiple times throughout the life of an unresolved promise in the PR referenced just above. Now we only are going to do a simple check for key existence, which simplifies the cost model significantly. In particular the model now does not need to account for the period of time between when the promise_submit_data is called and when the future gets resolved, at least in terms of compute cost. Furthermore, I hear that we're now looking at making the timeout variable constant system-wide parameter, rather than a user-controllable one, which is probably a slight simplification as well.

In my mind a correct cost model in context of these changes looks like this:

nagisa commented 8 months ago

gas argument for promise_yield_create should be accompanied with the gas_weight argument. I added this to the task list.

saketh-are commented 8 months ago

Status update @walnut-the-cat: Fixed-length timeouts are implemented now. Work continues on gas costs and bounding congestion (mainly thanks @nagisa), as well as on the misc. smaller implementation details documented on this tracking issue.

mikedotexe commented 6 months ago

Very excited about this idea! Thank you, contributors