palomachain / paloma

The fast blockchain messenger protocol
Apache License 2.0
290 stars 136 forks source link

CONSENSUS FAILURE height=18672395 #1193

Closed rektDAOclub closed 2 months ago

rektDAOclub commented 2 months ago

What is happening?

Section description Provide as much context as you can. Give as much context as you can to make it easier for the developers to figure what is happening.

Got apphash error on block 18672395. Logs: https://gist.github.com/rektDAOclub/953dddbc8846e706171141448234127d

Paloma and pigeon versions and logs

Section description Write down paloma version. Write down pigeon version. Copy and paste pigeon config file as well as relevant ENV variables.

palomad version v1.14.0 commit: b895c4dec2f5551e897b16d98572f988e4c1cca5 builded from source

pigeon version App version: v1.11.3 Build commit hash: e5c58ef4de965595b58d14a780f41d362175767d

go version go version go1.21.3 linux/amd64

taariq commented 2 months ago

To add to this: https://paloma.explorers.guru/block/18672395 shows zero transactions.

byte-bandit commented 2 months ago

Paloma and Pigeon need to be compiled using Go v1.22.2 or later.

byte-bandit commented 2 months ago

@rektDAOclub Can you let us know the hardware you're running this node on?

byte-bandit commented 2 months ago

From our investigation:

This seems to be coming from ALL THE WAY down in the nodedb: https://github.com/cosmos/iavl/blob/e063edd1735558e826a41a1d8472a97e68918e8c/nodedb.go#L150

Looks like it's trying to load a node - and it successfully retrieves the node, so it does exist. But there's no value retrieved. The only mentions I can find online seem to suggest that the hardware might be too weak and the node approaching ulimit. But given that our node experienced this at the exact same block, an IO error seems highly unlikely.

rektDAOclub commented 2 months ago

@rektDAOclub Can you let us know the hardware you're running this node on?

We are rebuild with go version go1.22.3 linux/amd64 paloma and pigeon and restarted. Yep, Xeon E3-1271v3 3.6-4.0ggz, 32gb ram, only paloma validator node on this server

byte-bandit commented 2 months ago

@rektDAOclub Can you let us know the hardware you're running this node on?

We are rebuild with go version go1.22.3 linux/amd64 paloma and pigeon and restarted. Yep, Xeon E3-1271v3 3.6-4.0ggz, 32gb ram, only paloma validator node on this server

Thanks, that should be more than good enough to run Paloma. We investigated this but ended up hitting a roadblock on the nodedb implementation level. It's possible you were missing a value that was expected to be there, but why that would be, I cannot say.

Since you were able to sync from a snapshot and rejoin the flock, we're likely going to table this one for now unless it happens again.

taariq commented 2 months ago

Closing as tabled for now.