vegaprotocol / vega

A Go implementation of the Vega Protocol, a protocol for creating and trading derivatives on a fully decentralised network.
https://vega.xyz
GNU Affero General Public License v3.0
37 stars 22 forks source link

Snapshots TM 0.35: understand backfilling #5471

Closed wwestgarth closed 2 years ago

wwestgarth commented 2 years ago

Spike Overview

When testing the new snapshot-pipelines against devnet with Tendermint v0.35 we noticed additional logs during a snapshot restore that did not happen when restoring in v0.34. The logs seem to suggest that tendermint now "backfills" historic blocks after a restore, for example if you snapshot restore to block-height 1000 tendermint will restore to that height, but now also send over the tendermint block-data for heights 1-999? This seems to increase the time it takes for a restore to happen, almost negating the reason for snapshot restore it in the first place.

This ticket is to look at the new tendermint release notes, and config options to see if this action is configurable so that we can fully understand how the new tendermint version works. The spec of it is here: https://github.com/tendermint/tendermint/blob/master/docs/architecture/adr-068-reverse-sync.md The RFC is here: https://github.com/tendermint/spec/blob/master/rfc/005-reverse-sync.md

Our snapshot system-tests currently relying on not having block-data for block_height 1 as a check that a restore did happen, so this ticket may have implications in general for updating core to 0.35.

Specs

Acceptance Criteria

How do we know when this spike is ready to either drop or move into technical tasks:

Additional Details (optional)

Any additional information including known dependencies, impacted components.

Examples (optional)

Code snippets, links to prototypes.

ze97286 commented 2 years ago

so an update from discussion with tendermint:

func DefaultEvidenceParams() EvidenceParams {
    return EvidenceParams{
        MaxAgeNumBlocks: 100000, // 27.8 hrs at 1block/s
        MaxAgeDuration:  48 * time.Hour,
        MaxBytes:        1048576, // 1MB
    }
}

@wwestgarth

wwestgarth commented 2 years ago

That makes sense, thanks.

The changelog suggests that tendermint has added events for start/end statesync so in terms of this breaking the system tests we will just have to switch this to check for those events instead.

The tendermint docs suggest that the expiry on evidence should be equal to the stake bonding period, which is something I don't think vega has. Also the evidence is passed over the ABCI for the application to penalise a validator who tries to cheat (double vote etc.) which again vega does not do (at the moment). So we could set MaxAgeNumBlocks to be much smaller.

But I think it makes sense to wait until we've migrated to 0.35 and to see just how long it takes to backfill 10,000 blocks, and if the performance is really crap think about what to set the EvidenceParams to that better fits vegas use-case.

gordsport commented 2 years ago

blocked until:

gordsport commented 2 years ago

Closing as we need to move back to 0.34 as advised by Tendermint