Migrate chunk state witness pre validation logic from orphan witness handler to partial witness actor

near / nearcore

Reference client for NEAR Protocol

https://near.org

GNU General Public License v3.0

2.33k stars 629 forks source link

Migrate chunk state witness pre validation logic from orphan witness handler to partial witness actor #11193

Closed shreyan-gupta closed 5 months ago

shreyan-gupta commented 6 months ago

The first contact a node had with the network layer to receive the chunk state witness is crucial to do proper validation.

Before the introduction of partial state witness, the first point of contact was ChunkStateWitnessMessage from network adapter, however after the introduction of partial witnesses, we would have to shift all the validation functionality to the partial_witness_actor.

This mainly includes shifting validation defined in partially_validate_state_witness in Client and handle_orphan_state_witness in Client

The first part of this has already been implemented in validate_partial_encoded_state_witness but we need eyes on the rest of the validation.

### Tasks
- [ ] https://github.com/near/nearcore/issues/11303
- [ ] https://github.com/near/nearcore/issues/11301
- [ ] https://github.com/near/nearcore/issues/11302
- [ ] https://github.com/near/nearcore/issues/11304
- [ ] https://github.com/near/nearcore/pull/11309

shreyan-gupta commented 6 months ago

@pugachAG @jancionear have previously worked on the validation side.

walnut-the-cat commented 6 months ago

Currently, both @pugachAG and @shreyan-gupta are listed as assignee of this task. Please update it accordingly as we start working on this-

pugachAG commented 6 months ago

We need to do the following:

reject partial witness for chunks before the last final block:
- not sure LRU cache cleanup is required, probably it is OK to just let those entries get evicted as we process new witnesses
- requires access to the chain head from the partial witness actor
keep track of the already processed witnesses and reject witness parts: similar to #11111
ensure that chunk witness corresponds to the metadata from PartialEncodedStateWitnessInner: validate epoch_id, shard_id and height_created.
add EpochId as part of parts_cache key to have (ShardId, BlockHeight, EpochId)

walnut-the-cat commented 6 months ago

Adding arbitrary timeline for now. Please adjust with the correct estimation

shreyan-gupta commented 6 months ago

Notes from conversation Anton and I had offline

We need to add an additional check for the size of the partial witness. That would look something on the lines of max_state_witness_size * encoding_size_increase_factor / num_validators, approx 10 MB * 1.25 / ~50 validators
We need epoch_id in lru cache key of partial state witness tracker as we can potentially have valid heights, shard_id pairs in different epochs based on forks and where the epoch boundary is.
Reading chain head directly from RocksDB shouldn't be too bad. This is the better alternative of passing it around and fetching it from client.
The set of checks we need to have in place should be at a level similar to those we have for chunk distribution in order to avoid over-engineering things.

shreyan-gupta commented 6 months ago

keep track of the already processed witnesses and reject witness parts: similar to https://github.com/near/nearcore/pull/11111

Quick comment on this, I think one good way of ensuring this would be to have a separate LRU cache just keeping track of "seen" or "observed" witnesses.

Currently we have the parts_cache in partial_witness_tracker that is possibly exposed to overflowing and cache eviction. In the current implementation, once we mark an entry as is_decoded once we have enough parts and send the decoded state witness to client.

I'm slightly concerned what would happen if this cache entry gets evicted and a malicious chunk producer sends over all the parts again. In such a case we would have no memory of having already processed this witness.

A quick solution is to have a tiny LRU cache for all witnesses we have decoded and quickly check that once.

shreyan-gupta commented 6 months ago

Potential discussion points: Is it fine to have hard checks based on "distance from head". For example, can we safely say, "we should not be processing any state witnesses that are a distance 50 away from our current head"?

Can there be issues with respect to skipped blocks etc.?

pugachAG commented 6 months ago

After looking into think for quite some time now I think it makes sense to consolidate partial witness tracking with the orphan pool. All validation logic from the orphan pool is applicable when validating witness parts. This can be implemented by adding an actix message sent from the Client to the PartialWitnessActor notifying about the processed block. This way partial witness tracker doesn't immediately send the witness for processing when prev block is not present, but retains it until the corresponding block is received from the Client. Also I think we can safely restrict the range of considered state witness heights to (last_final_height..chain_head_height + 5] (the same as front horizon MAX_HEIGHTS_AHEAD constant from chunk_cache.rs).