palomachain / paloma

The fast blockchain messenger protocol
Apache License 2.0
290 stars 135 forks source link

BUG: Prevote validators still experiencing: ERR prevote step: consensus deems this block invalid; prevoting nil err="wrong Block.Header.AppHash. #1134

Closed taariq closed 7 months ago

taariq commented 7 months ago

What is happening?

Section description Provide as much context as you can. Give as much context as you can to make it easier for the developers to figure what is happening.

BUG: Prevote validators still experiencing AppHash after upgrade of tag v1.13.1

INF resetting proposal info height=15841321 module=consensus proposer=67B15E8CC3FC62521FA50632FD085C84A6387277 round=296 8:35PM INF received proposal module=consensus proposal="Proposal{15841321/296 (30EA308D78EEB3D4749941251BA487F1F66B7A307FCDFA9F3E109E262EF46393:1:1026655BA062, -1) 07102F0BC4EE @ 2024-04- 16T20:35:03.126136283Z}" proposer=67B15E8CC3FC62521FA50632FD085C84A6387277 8:35PM INF received complete proposal block hash=30EA308D78EEB3D4749941251BA487F1F66B7A307FCDFA9F3E109E262EF46393 height=15841321 module=consensus 8:35PM ERR prevote step: consensus deems this block invalid; prevoting nil err="wrong Block.Header.AppHash. Expected FB5D74ED0454E8DFE9D09AB13614745989B8C65553BA279AAE2D3D945BFA0460, got 531DE8D10C57C6E49BC4F2CEB19CC364DBA2C28C7C59FE88C0243CC7CDD38223" height=15841321 module=consensus round=296.

Source: https://gist.github.com/pcheliniy/7cafa796d9bfb9723ee74ee1a06a6afd

  1. Validators are mostly running the same binary commit: commit: 1143f40382ff1540dc134ceecf11b50af756428a for examle Proposer 67B15E8CC3FC62521FA50632FD085C84A6387277 is TRGC which is on the correct commit hash.
  2. https://docs.google.com/spreadsheets/d/122_iioKezsBpC6dQ447KY7Ywn2nd3r8VLsz86e3dS0g/edit#gid=0
  3. There are 6 persistent peers in our setup instructions.
  4. How do we determine what is causing the apphash non-determinism?
  5. This issue appears directly related to https://github.com/palomachain/paloma/issues/1124 and related to the Cosmos SDK 0.50.5 upgrade from SDK 0.47.1

Paloma and pigeon versions and logs

Section description Write down paloma version. Write down pigeon version. Copy and paste pigeon config file as well as relevant ENV variables.
:~$ palomad version --long | tail
- rsc.io/tmplfunc@v0.0.3
- sigs.k8s.io/yaml@v1.4.0
build_tags: netgo,ledger
commit: 1143f40382ff1540dc134ceecf11b50af756428a
cosmos_sdk_version: v0.50.5
go: go version go1.22.2 linux/amd64
name: paloma
server_name: palomad
version: 1.13.1

How to reproduce?

Section description Please write detailed steps of what you were doing for this bug to appear.

All nodes on the racer network in prevoting so this is ocurring with each round.

What is the expected behaviour?

Section description If you know, please write down what is the expected behaviour. If you don't know, that's ok. We can have a discussion in comments.

Non-determinism is removed and apphashes do not occur. We need to be able to debug the source of the apphash.

taariq commented 7 months ago

Fixed with https://github.com/palomachain/paloma/pull/1140