Describe the bug
Patch state for sandbox is failing when being ran concurrently with a large number of sandbox nodes.
The following error message is presented when the test fails. The test was patching the registrar account from testnet into the sandbox node.
Error: Failed to query access key: handler error: [Access key for public key ed25519:5WMgq6gKZbAr7xBZmXJHjnj4C3UZkNJ4F5odisUBFcRh has never been observed on the node]
This seems to happen on machines that are under powered such as a CI pipeline. Running the same tests locally (on a macbook M1) works fine.
This issue seems related to sharding, as the code for patching state seems to have been moved with it when it was added. Also with the relevant code bit here being a candidate suspect:
To Reproduce
This is a bit hard to reproduce as we need a somewhat underpowered machine running a lot of tests at once. This was first noticed while being ran with PR tests on aurora-is-near/aurora-eth-connector. They have at least 36 tests with each of the tests spinning up their own separate sandbox node, calling into patch state at least once. Most of the tests do pass, but like 1 or 2 sometimes fail. It feels like there's data contention somewhere related to patching state.
Expected behavior
All tests pass as normal.
Version (please complete the following information):
Describe the bug Patch state for sandbox is failing when being ran concurrently with a large number of sandbox nodes. The following error message is presented when the test fails. The test was patching the
registrar
account from testnet into the sandbox node.This seems to happen on machines that are under powered such as a CI pipeline. Running the same tests locally (on a macbook M1) works fine.
This issue seems related to sharding, as the code for patching state seems to have been moved with it when it was added. Also with the relevant code bit here being a candidate suspect:
https://github.com/near/nearcore/blob/52087fb24c8211f8c8770f31b6e3e427b7228799/chain/chain/src/chain.rs#L3738-L3740
To Reproduce This is a bit hard to reproduce as we need a somewhat underpowered machine running a lot of tests at once. This was first noticed while being ran with PR tests on aurora-is-near/aurora-eth-connector. They have at least 36 tests with each of the tests spinning up their own separate sandbox node, calling into patch state at least once. Most of the tests do pass, but like 1 or 2 sometimes fail. It feels like there's data contention somewhere related to patching state.
Expected behavior All tests pass as normal.
Version (please complete the following information):
Additional context First reported in https://github.com/near/workspaces-rs/issues/253