palomachain / paloma

The fast blockchain messenger protocol
Apache License 2.0
291 stars 135 forks source link

Chain Halt: ERR prevote step: consensus deems this block invalid; prevoting nil err="wrong Block.Header.AppHash after received proposal #1144

Closed taariq closed 7 months ago

taariq commented 7 months ago

What is happening?

Section description Provide as much context as you can. Give as much context as you can to make it easier for the developers to figure what is happening.

Apphashes after receiving proposal and prevote step For example:

Block Prior: 15849600 https://paloma.explorers.guru/block/15849600 Transactions: None

Apr 20 09:07:52 Ubuntu-2004-focal-64-minimal-hwe cosmovisor[1153895]: 9:07AM INF received proposal module=consensus proposal="Proposal{15849601/71 (38E3114EF2728D576D66878D5E53650EE245580F80559495C1AD35053D6DCB42:1:B6AC9714F49B, -1) B8C85062D90D @ 2024-04-20T07:07:51.899382942Z}" proposer=22635F06A62F6D25A76EEA03F478291804C5A390 Apr 20 09:07:52 Ubuntu-2004-focal-64-minimal-hwe cosmovisor[1153895]: 9:07AM INF received complete proposal block hash=38E3114EF2728D576D66878D5E53650EE245580F80559495C1AD35053D6DCB42 height=15849601 module=consensus Apr 20 09:07:52 Ubuntu-2004-focal-64-minimal-hwe cosmovisor[1153895]: 9:07AM ERR prevote step: consensus deems this block invalid; prevoting nil err="wrong Block.Header.AppHash. Expected B2119D90E37D8E784AD2D59E76440732D9986D3ECE1F1840A385BCD6595CC794, got 962E13EE35F0DBEDB69C243062C68D67C025BB6DA953290F0E03C2FA5C18DDC7" height=15849601 module=consensus round=71

Paloma and pigeon versions and logs

Section description Write down paloma version. Write down pigeon version. Copy and paste pigeon config file as well as relevant ENV variables.

Paloma: v1.13.2

How to reproduce?

Section description Please write detailed steps of what you were doing for this bug to appear.

Chain Halt after apphash halted the chain

What is the expected behaviour?

Section description If you know, please write down what is the expected behaviour. If you don't know, that's ok. We can have a discussion in comments.

Something is happening in the prevote step.

taariq commented 7 months ago

@byte-bandit @vishal-kanna I looked at DyDyx https://github.com/dydxprotocol/v4-chain/blob/main/protocol/go.mod setup. Is it possible we might want to test upgrading iavl to their version?

byte-bandit commented 7 months ago

It is one of the key differences I can see. They also override all the dependencies which I previously rejected because I aligned with Marko, but I think it’s worth a shot now.

I will prepare a PR.

On Sat, 20 Apr 2024 at 16:10, Taariq Lewis @.***> wrote:

@byte-bandit https://github.com/byte-bandit @vishal-kanna https://github.com/vishal-kanna I looked at DyDyx https://github.com/dydxprotocol/v4-chain/blob/main/protocol/go.mod setup. Is it possible we might want to test upgrading iavl to their version?

— Reply to this email directly, view it on GitHub https://github.com/palomachain/paloma/issues/1144#issuecomment-2067685638, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANJLG2GTZDMA5VYRLSHE7DY6JZLRAVCNFSM6AAAAABGQQXMD6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANRXGY4DKNRTHA . You are receiving this because you were mentioned.Message ID: @.***>

taariq commented 7 months ago

@deepan95dev @vishal-kanna: reposting from Telegram here:

One of our validators created a script to analyze hashes for all the various store names. I had another validator whose validator disagreed with the majority group run the script. The store name with a different apphash between validators is the 5: store name distribution.

I dug around recent PRs for the Distribution Module and found Deepan's SDK 0.47.0 to 0.50.5 upgrade where / Handle fee distribution state. / was changed. https://github.com/palomachain/paloma/commit/a4b1e22d4363511b0337d95ea07ab576481a5a3b#diff-9bcaf321191d4f6106e27485bdfbbaef716b4e0e4f648d8e5b81a87bc93b6c77L73

  1. @deepan95dev: Will you check your PR for possible non-determinism in your upgrade change of the distribution module?

  2. The gist of the apphash script and output is here: https://gist.github.com/freak12techno/845a3061ed65295667c145c05ffd3b23

taariq commented 7 months ago

Fixed by removing validators from persistent peers that were not in active set.