Yield Execution (NEP 516 / NEP 519)

saketh-are commented 9 months ago

NEPs: near/NEPs#516 near/NEPs#519

The following branches contain a basic prototype of yield execution supporting the chain signatures use case:

nearcore: https://github.com/saketh-are/nearcore/tree/yield_resume_v0
near-sdk-rs: https://github.com/saketh-are/near-sdk-rs/tree/mpc_contract_v0

To test out the chain signatures contract:

Build neard and run localnet.
Build mpc_contract from near-sdk-rs/examples.
Create a new account mpc.node0 on localnet: env NEAR_ENV=localnet near create-account "mpc.node0" --keyPath ~/.near/localnet/node0/validator_key.json --masterAccount node0.
Additionally create accountsrequester.node0 and signer.node0.
Deploy the contract code: env NEAR_ENV=localnet near deploy "mpc.node0" <path/to/mpc_contract.wasm>
Submit a signature request env NEAR_ENV=localnet near call mpc.node0 sign '{"payload" : "foo"}' --accountId requester.node0. Observe that the request will hang.
From a separate terminal, use env NEAR_ENV=localnet near call mpc.node0 log_pending_requests --accountId signer.node0 to see the data id for the pending request. In a real use case, the signer node will monitor the contract via indexers to get this information.
Submit a "signature" via env NEAR_ENV=localnet near call mpc.node0 sign_respond '{"data_id":"<data id here>","signature": "sig_of_foo"}' --accountId signer.node0
In the original terminal, observe that the signature request returns with "sig_of_foo_post".

Note that steps 6-8 are a bit time-sensitive at the moment. If the call to sign in step 6 doesn't receive a response from step 8 within roughly a minute, you'll eventually see a message Retrying transaction due to expired block hash.

Remaining work includes:

### Tasks
- [x] Implement timeouts
- [x] Ensure that the caller for `promise_data_yield` is the only one that can call `promise_submit_data`
- [x] Ensure that `promise_submit_data` cannot be used with data ids not created by `promise_data_yield`
- [x] Ensure the case where `yield_resume` is called within the same transaction as `yield_create` works correctly
- [x] Rework the implementation on the nearcore side to avoid creating a new Action type
- [x] `gas` argument for `promise_yield_create` should be accompanied with the `gas_weight` argument
- [x] Simplify trie state to just yield queue plus postponed action receipts
- [ ] Ensure gas costs are charged properly, including a new cost parameter for postponing a receipt
- [x] Implement resharding logic for yielded promise queue
- [x] Determine an acceptable value for the timeout length
- [ ] Tests: feature gating works as expected
- [x] Tests: functionality works as expected (i.e. see various "ensure" steps above – all of these should be tested)
- [ ] Check if profile entries related to yield/resume need to be excluded from the profiles while this functionality ain't stable
- [x] The error should be the same for all different failure modes of `yield_submit_data_receipt`
- [x] Look into "Retrying transaction due to expired block hash"
- [ ] Look into integration with the higher-level Promises API in near-sdk-rs

walnut-the-cat commented 9 months ago

can we link PRs for listed tasks to this tracking issue?

saketh-are commented 9 months ago

I have a couple of draft PRs I'm continuing to iterate on:

https://github.com/near/nearcore/pull/10415 for the nearcore changes
https://github.com/near/near-sdk-rs/pull/1133 for the rust sdk changes and test contract

I don't anticipate having separate PRs for each subtask since it won't make sense to merge this until we have all the details right.

nagisa commented 9 months ago

Here some thoughts I had over today as I was thinking about estimation and costs for this feature.

The hard part is really just paying for the storage. The compute costs for handling these new operations seem straightforward and can be largely replicated from the code for the already implemented estimations.
promise_await_data seems quite straightforward in isolation too;
- Except when used in combination with promise_and: you can join a bunch of promise_data_await dependencies together and some of the backing structures will have to live for as long as the longest promise_data_await! If implemented naively, this combination can result in surprising amount of gas charged for the amount of work done.
Data is written to rocksdb as a result of promise_submit_data but that will most likely happen deep inside the transaction runtime outskirts where gas tracking is no longer conducted, so it is necessary to account for this at the same time base action cost is charged.
- cost(promise_submit_data) = action_base + storage_base + storage_per_byte?
This data is later read out at least multiple times over before the continuation for promise_then is invoked.
- As far as I understand the value can end up being read number of times in proportion to how long the pending continuation action’s inputs remain unresolved;
- It will still be read multiple times (but possibly a fewer number) even if the the dependencies are immediately resolved;
- But more importantly there seems to be no way to know ahead of time just how many reads of this data can occur. It doesn’t help to even if we could guarantee that receipt inputs are read out once every block production cycle. That’s because users can arbitrarily delay the resolution of the promise in the proposed design.
- It sounds like it'd be really hard to estimate this accurately and we should replace these reads outs with a presence check (which is what they do) in the first place, which would make this problem much less severe and cost estimation easier to reason about.
  - I’ll try hacking on it, maybe I’ll learn something new along the way…
- Regardless, promise_submit_data actually sounds like the most appropriate place to charge gas for storage of the data being submitted as this is the first point in the logic flow at which both the number of blocks to store the data for and the amount of data being stored is known.
- The costs here should be appreciable, so that there isn’t an economic incentive to use this as a mechanism for data storage.

nagisa commented 8 months ago

I have addressed the concern of the data being read out multiple times throughout the life of an unresolved promise in the PR referenced just above. Now we only are going to do a simple check for key existence, which simplifies the cost model significantly. In particular the model now does not need to account for the period of time between when the promise_submit_data is called and when the future gets resolved, at least in terms of compute cost. Furthermore, I hear that we're now looking at making the timeout variable constant system-wide parameter, rather than a user-controllable one, which is probably a slight simplification as well.

In my mind a correct cost model in context of these changes looks like this:

cost(promise_await_data) = action_base + max_timeout * is_ready_check_cost – the action_base covers the compute resources to set up the action receipt, the max_timeout * ready_check_cost part covers the compute resources of checking whether the promise is ready to go at the worst case of every block. ready_check_cost should be roughly equivalent to a single check of key existence in the database.
- Compared to the usual promise_ operations the ready_check_cost is unique to this function, due to its giving away resolution timing control to the end users.
- In practice we won’t be reading out the receipt data input state every block, but the worst case, I believe allows such scenario to happen. Our current costs for storage ops are "storage_read_key_byte": 30952533 and "storage_read_value_byte": 5611005. These are low enough that I can think we could afford to not worry about the refunds for unused delay in case the future gets resolved early.
cost(promise_submit_data) = action_base + cost(storage_write(data)) + cost(storage_read(data)) + cost(storage_for_max_timeout(data))
- This model assumes we can only submit_data successfully once.
- We charge whatever fees for storing the data for up-to maximum timeout here, in case the await_data this is is paired with is part of a promise_and and cannot be immediately resolved.
- This is a departure from the usual staking-for-storage model, charging gas to store data is entirely new concept for us... Can we somehow get the contract stake near just like it would need to for regular storage operations?
  - I wrote down my thoughts on this here onwards
Upon promise resolution: refund a difference between cost(storage_for_max_timeout(data)) and cost(storage_for_n_blocks(data, actual_blocks_of_delay)).
- Do we refund the difference for unused storage time once the await_data continuation is executed at all? Whom do we refund to -- the contract? the person who originally invoked the function call that then executed promise_submit_data? How do we ensure we use the same gas:near price ratio as was used when creating submit_data?
- If we do not refund the storage fees, there's no incentive to resolve all pending receipts ASAP, but it simplifies the implementation of the validator significantly.

nagisa commented 8 months ago

gas argument for promise_yield_create should be accompanied with the gas_weight argument. I added this to the task list.

saketh-are commented 8 months ago

Status update @walnut-the-cat: Fixed-length timeouts are implemented now. Work continues on gas costs and bounding congestion (mainly thanks @nagisa), as well as on the misc. smaller implementation details documented on this tracking issue.

mikedotexe commented 6 months ago

Very excited about this idea! Thank you, contributors

near / nearcore

Yield Execution (NEP 516 / NEP 519) #10455