terra-money / classic-core

GO implementation of the Terra Protocol
https://www.terra.money
Other
978 stars 286 forks source link

[BUG] Node stops syncing after to VM cache error #604

Closed fabio-nukui closed 2 years ago

fabio-nukui commented 2 years ago

Describe the bug

Same bug as reported in #599, using v0.5.11. Node has to be restarted multiple times a day

Snippets of the logs:

5:16AM INF executed block height=5263003 module=state num_invalid_txs=0 num_valid_txs=7
5:16AM INF commit synced commit=436F6D6D697449447B5B323530203233352034322034342031363920313337203139322031343920313237203939203435203238203636203830203139332031332031353120393120313533203132
3720313339203231322031363920313331203139322031353720313633203231372032382031323720313034203135325D3A3530344539427D
5:16AM INF committed state app_hash=FAEB2A2CA989C0957F632D1C4250C10D975B997F8BD4A983C09DA3D91C7F6898 height=5263003 module=state num_txs=7
5:16AM INF indexed block height=5263003 module=txindex
thread '<unnamed>' panicked at 'assertion failed: prev.start > max', /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/wasmer-engine-2.0.0/src/trap/frame_info.rs:232:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: PoisonError { .. }', /usr/local/cargo/git/checkouts/cosmwasm-bbcc9de72c368f6b/93ea53e/packages/vm/src/cache.rs:256:43
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: PoisonError { .. }', /usr/local/cargo/git/checkouts/cosmwasm-bbcc9de72c368f6b/93ea53e/packages/vm/src/cache.rs:256:43

...

thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: PoisonError { .. }', /usr/local/cargo/git/checkouts/cosmwasm-bbcc9de72c368f6b/93ea53e/packages/vm/src/cache.rs:25
6:43
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: PoisonError { .. }', /usr/local/cargo/git/checkouts/cosmwasm-bbcc9de72c368f6b/93ea53e/packages/vm/src/cache.rs:25
6:43
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: PoisonError { .. }', /usr/local/cargo/git/checkouts/cosmwasm-bbcc9de72c368f6b/93ea53e/packages/vm/src/cache.rs:25
6:43
5:16AM INF Timed out dur=4663.671183 height=5263005 module=consensus round=0 step=1
5:16AM INF received proposal module=consensus proposal={"Type":32,"block_id":{"hash":"945C10781FB8CBECB36AF34D2BAD7A126765656C2677E69B7C76272C47AA0EC9","parts":{"hash":"1FE1788A28D650297EB9B
28BA05A706A1D7FCC6EBF9054B2323623482E9AF505","total":3}},"height":5263005,"pol_round":-1,"round":0,"signature":"eqKIrKAYLdNN1/Sjs5SrG93Tqp+qxY+Sh74shjjuEzZrhcOqJaaWFtwTuURbmXcOB7lbTt0JQjoRVl
rvsJfrDw==","timestamp":"2021-11-12T05:16:54.265983122Z"}
5:16AM INF received complete proposal block hash=945C10781FB8CBECB36AF34D2BAD7A126765656C2677E69B7C76272C47AA0EC9 height=5263005 module=consensus
5:16AM ERR prevote step: ProposalBlock is invalid err="wrong Block.Header.AppHash.  Expected 23247F4D38DB13BD41389EFD52931612B7100716F60274AA4C8BB73421D66E4B, got DE07F8EE13585675D6931DC14EB
486F78EA0B09EF2C9F11FAA07BF725710EEDB" height=5263005 module=consensus round=0
5:16AM ERR CONSENSUS FAILURE!!! err="precommit step; +2/3 prevoted for an invalid block: wrong Block.Header.AppHash.  Expected 23247F4D38DB13BD41389EFD52931612B7100716F60274AA4C8BB73421D66E4
B, got DE07F8EE13585675D6931DC14EB486F78EA0B09EF2C9F11FAA07BF725710EEDB" module=consensus stack="goroutine 69788 [running]:\nruntime/debug.Stack(0xc05f93f370, 0x2280f00, 0xc116802140)\n\trun
time/debug/stack.go:24 +0x9f\ngithub.com/tendermint/tendermint/consensus.(*State).receiveRoutine.func2(0xc000c83880, 0x2a782a8)\n\tgithub.com/tendermint/tendermint@v0.34.13/consensus/state.g
o:726 +0x5b\npanic(0x2280f00, 0xc116802140)\n\truntime/panic.go:965 +0x1b9\ngithub.com/tendermint/tendermint/consensus.(*State).enterPrecommit(0xc000c83880, 0x504e9d, 0x0)\n\tgithub.com/tend
ermint/tendermint@v0.34.13/consensus/state.go:1390 +0x1c4c\ngithub.com/tendermint/tendermint/consensus.(*State).addVote(0xc000c83880, 0xc1150e25a0, 0xc12243b0b0, 0x28, 0x2a7ab90, 0xc05f93fc0
8, 0x11accd9)\n\tgithub.com/tendermint/tendermint@v0.34.13/consensus/state.go:2104 +0x131e\ngithub.com/tendermint/tendermint/consensus.(*State).tryAddVote(0xc000c83880, 0xc1150e25a0, 0xc1224
3b0b0, 0x28, 0xc114f97500, 0xc1150d2480, 0xc05b9c11c929a34d)\n\tgithub.com/tendermint/tendermint@v0.34.13/consensus/state.go:1930 +0x56\ngithub.com/tendermint/tendermint/consensus.(*State).h
andleMsg(0xc000c83880, 0x2ca7b20, 0xc1150d43d8, 0xc12243b0b0, 0x28)\n\tgithub.com/tendermint/tendermint@v0.34.13/consensus/state.go:838 +0x96d\ngithub.com/tendermint/tendermint/consensus.(*S
tate).receiveRoutine(0xc000c83880, 0x0)\n\tgithub.com/tendermint/tendermint@v0.34.13/consensus/state.go:762 +0x3f2\ncreated by github.com/tendermint/tendermint/consensus.(*State).OnStart\n\t
github.com/tendermint/tendermint@v0.34.13/consensus/state.go:378 +0x8c5\n"

To Reproduce

Run terrad with heavy query loads (my application mostly decodes mempool transactions and queries contracts).

Context & versions

v0.5.11 Linux release build, running inside docker image based on ubuntu:focal-20211006

terrad version --long:

name: terra
server_name: terrad
version: 0.5.11
commit: 19b75bfd60f7623bf4ef8919ab0e52d73ad99643
build_tags: netgo muslc,
go: go version go1.16.5 linux/amd64
build_deps:
- filippo.io/edwards25519@v1.0.0-beta.2
- github.com/99designs/keyring@v1.1.6 => github.com/cosmos/keyring@v1.1.7-0.20210622111912-ef00f8ac3d76
- github.com/ChainSafe/go-schnorrkel@v0.0.0-20200405005733-88cbf1b4c40d
- github.com/CosmWasm/wasmvm@v0.16.2
- github.com/Workiva/go-datastructures@v1.0.52
- github.com/armon/go-metrics@v0.3.9
- github.com/beorn7/perks@v1.0.1
- github.com/bgentry/speakeasy@v0.1.0
- github.com/btcsuite/btcd@v0.22.0-beta
- github.com/cespare/xxhash/v2@v2.1.1
- github.com/coinbase/rosetta-sdk-go@v0.6.10
- github.com/confio/ics23/go@v0.6.6
- github.com/cosmos/cosmos-sdk@v0.44.2
- github.com/cosmos/go-bip39@v1.0.0
- github.com/cosmos/iavl@v0.17.1
- github.com/cosmos/ibc-go@v1.1.0
- github.com/davecgh/go-spew@v1.1.1
- github.com/desertbit/timer@v0.0.0-20180107155436-c41aec40b27f
- github.com/dvsekhvalnov/jose2go@v0.0.0-20200901110807-248326c1351b
- github.com/enigmampc/btcutil@v1.0.3-0.20200723161021-e2fb6adb2a25
- github.com/felixge/httpsnoop@v1.0.1
- github.com/fsnotify/fsnotify@v1.4.9
- github.com/go-kit/kit@v0.10.0
- github.com/go-logfmt/logfmt@v0.5.0
- github.com/godbus/dbus@v0.0.0-20190726142602-4481cbc300e2
- github.com/gogo/gateway@v1.1.0
- github.com/gogo/protobuf@v1.3.3 => github.com/regen-network/protobuf@v1.3.3-alpha.regen.1
- github.com/golang/protobuf@v1.5.2
- github.com/golang/snappy@v0.0.3-0.20201103224600-674baa8c7fc3
- github.com/google/btree@v1.0.0
- github.com/google/orderedcode@v0.0.1
- github.com/gorilla/handlers@v1.5.1
- github.com/gorilla/mux@v1.8.0
- github.com/gorilla/websocket@v1.4.2
- github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0
- github.com/grpc-ecosystem/grpc-gateway@v1.16.0
- github.com/gsterjov/go-libsecret@v0.0.0-20161001094733-a6f4afe4910c
- github.com/gtank/merlin@v0.1.1
- github.com/gtank/ristretto255@v0.1.2
- github.com/hashicorp/go-immutable-radix@v1.0.0
- github.com/hashicorp/golang-lru@v0.5.4
- github.com/hashicorp/hcl@v1.0.0
- github.com/hdevalence/ed25519consensus@v0.0.0-20210204194344-59a8610d2b87
- github.com/improbable-eng/grpc-web@v0.14.1
- github.com/klauspost/compress@v1.11.7
- github.com/libp2p/go-buffer-pool@v0.0.2
- github.com/magiconair/properties@v1.8.5
- github.com/mattn/go-isatty@v0.0.14
- github.com/matttproud/golang_protobuf_extensions@v1.0.1
- github.com/mimoo/StrobeGo@v0.0.0-20181016162300-f8f6d4d2b643
- github.com/minio/highwayhash@v1.0.1
- github.com/mitchellh/mapstructure@v1.4.1
- github.com/mtibben/percent@v0.2.1
- github.com/pelletier/go-toml@v1.9.3
- github.com/pkg/errors@v0.9.1
- github.com/pmezard/go-difflib@v1.0.0
- github.com/prometheus/client_golang@v1.11.0
- github.com/prometheus/client_model@v0.2.0
- github.com/prometheus/common@v0.29.0
- github.com/prometheus/procfs@v0.6.0
- github.com/rakyll/statik@v0.1.7
- github.com/rcrowley/go-metrics@v0.0.0-20200313005456-10cdbea86bc0
- github.com/regen-network/cosmos-proto@v0.3.1
- github.com/rs/cors@v1.7.0
- github.com/rs/zerolog@v1.23.0
- github.com/spf13/afero@v1.6.0
- github.com/spf13/cast@v1.3.1
- github.com/spf13/cobra@v1.2.1
- github.com/spf13/jwalterweatherman@v1.1.0
- github.com/spf13/pflag@v1.0.5
- github.com/spf13/viper@v1.8.1
- github.com/stretchr/testify@v1.7.0
- github.com/subosito/gotenv@v1.2.0
- github.com/syndtr/goleveldb@v1.0.1-0.20200815110645-5c35d600f0ca
- github.com/tendermint/btcd@v0.1.1
- github.com/tendermint/crypto@v0.0.0-20191022145703-50d29ede1e15
- github.com/tendermint/go-amino@v0.16.0
- github.com/tendermint/tendermint@v0.34.13
- github.com/tendermint/tm-db@v0.6.4
- golang.org/x/crypto@v0.0.0-20210513164829-c07d793c2f9a
- golang.org/x/net@v0.0.0-20210903162142-ad29c8ab022f
- golang.org/x/sys@v0.0.0-20210903071746-97244b99971b
- golang.org/x/term@v0.0.0-20201126162022-7de9c90e9dd1
- golang.org/x/text@v0.3.6
- google.golang.org/genproto@v0.0.0-20210602131652-f16073e35f0c
- google.golang.org/grpc@v1.40.0 => google.golang.org/grpc@v1.33.2
- google.golang.org/protobuf@v1.27.1
- gopkg.in/ini.v1@v1.62.0
- gopkg.in/yaml.v2@v2.4.0
- gopkg.in/yaml.v3@v3.0.0-20210107192922-496545a6307b
- nhooyr.io/websocket@v1.8.6
cosmos_sdk_version: v0.44.2
hanjukim commented 2 years ago

Update Core to v0.5.11

fabio-nukui commented 2 years ago

@hanjukim As in the report, this error is still happening here with v0.5.11. I am using the pre-compiled release at https://github.com/terra-money/core/releases/download/v0.5.11/terra_0.5.11_Linux_x86_64.tar.gz

hanjukim commented 2 years ago

Thank you for your finding. Did you also send broadcast txs to the node as well?

fabio-nukui commented 2 years ago

Yes, I broadcasted txs with the node as well

ivanfong01 commented 2 years ago

I did following actions with my node:

yun-yeo commented 2 years ago

Seems v0.16.2 is fix for other problem(some vector managing) in libwasmvm, it is still problem.

yun-yeo commented 2 years ago

@ivanfong01 @fabio-nukui

For temporal fix, you can bump Tendermint version to latest. It will make your node to handle wasm query in single thread.

Like https://github.com/terra-money/core/pull/594

fabio-nukui commented 2 years ago

@YunSuk-Yeo Thanks for the tip!

I will test this branch and see if there are stability / performance issues.

fabio-nukui commented 2 years ago

The new cosmos-sdk and tendermint versions seem to solve this issue.

I am running a few nodes with v0.5.11 and a few with the new versions. The v0.5.11 nodes froze ~5 times each today, the new nodes didn't freeze a single time.

I also didn't see significant performance differences in my applications.

yun-yeo commented 2 years ago

ok, seems this problem only happens in multi thread query env. Thanks for confirmation.

faddat commented 2 years ago

I think this took down our relayer yesteraday.

faddat commented 2 years ago

https://github.com/CosmWasm/wasmd/pull/692/commits

hanjukim commented 2 years ago

@YunSuk-Yeo found a way to reproduce the problem. https://github.com/wasmerio/wasmer/issues/2434#issuecomment-1032281200

yun-yeo commented 2 years ago

following up https://github.com/terra-money/core/issues/683