Closed achilleas-kal closed 2 years ago
Can you share your [mode.packets]
section of your config.toml?
Assuming the config is fine, we will need to see the logs in order to diagnose the issue.
[mode.packets] enabled = true
clear_interval = 30
clear_on_start = true
tx_confirmation = false
Sure, here's the config @adizere
Maybe tx_confirmation
should be set to true for this feature to work properly?
Maybe
tx_confirmation
should be set to true for this feature to work properly?
Hmm, no. Periodic packet clearing should trigger every clear_interval
blocks independently of tx_confirmation
, but this is a good lead to investigate whether we have a bug.
Some situations where periodic packet clearing does not work is if:
Do you think any of the two cases above might be the root cause? Are you seeing any activity for that channel in Hermes logs or metrics?
the packet workers for the channel you want to see cleared did not start altogether (Hermes did not get any WS events associated with that channel since it started)
Well I've been making transfers while that packet was stuck so I'd say that's not the case.
the channels are filtered out via policy based packet filtering
The channel is not filtered for sure, generally if it was I wouldn't be able to relay packets at all plus the explicit clear command wouldn't work right? I've manually checked as well and the channel is allowed.
Thanks for the quick feedback and checking on those things! This means we need to dig in the code + try to recreate this somehow. If you have any insights on what might help reproducing this behavior would be great.
the explicit clear command wouldn't work right
The explicit clear
CLI ignores the packet filtering specified in config.tom, it would work.
@adizere I can try to replicate a stuck packet on my end and dig into the logs, any insight on what I'm looking for would help.
thanks!
any insight on what I'm looking for would help.
Here are some things I would do if I had access to your setup:
start Hermes
do $ kill -USR1 $HERMES_PID
2022-07-27T14:46:32.679956Z INFO ThreadId(38) dumping state (triggered by SIGUSR1) 2022-07-27T14:46:32.693740Z INFO ThreadId(1114) 2022-07-27T14:46:32.693880Z INFO ThreadId(1114) Chains: ibc-0, ibc-1 2022-07-27T14:46:32.693896Z INFO ThreadId(1114) Client workers: 2022-07-27T14:46:32.693908Z INFO ThreadId(1114) - client::ibc-0->ibc-1:07-tendermint-1 (id: 3) 2022-07-27T14:46:32.693920Z INFO ThreadId(1114) | misbehaviour: true, refresh: true 2022-07-27T14:46:32.693933Z INFO ThreadId(1114) - client::ibc-1->ibc-0:07-tendermint-1 (id: 1) 2022-07-27T14:46:32.693992Z INFO ThreadId(1114) | misbehaviour: true, refresh: true 2022-07-27T14:46:32.694011Z INFO ThreadId(1114) Packet workers: 2022-07-27T14:46:32.694025Z INFO ThreadId(1114) - packet::channel-3/transfer:ibc-0->ibc-1 (id: 5) 2022-07-27T14:46:32.694036Z INFO ThreadId(1114) - packet::channel-4/transfer:ibc-1->ibc-0 (id: 6) 2022-07-27T14:46:32.694048Z INFO ThreadId(1114) Wallet workers: 2022-07-27T14:46:32.694059Z INFO ThreadId(1114) - wallet::ibc-0 (id: 2) 2022-07-27T14:46:32.694070Z INFO ThreadId(1114) - wallet::ibc-1 (id: 4) 2022-07-27T14:46:32.694083Z INFO ThreadId(1114)
- this allows to understand if Hermes is actually activating on the problematic channel
Would be curious to know on what channel are packets stuck and which are the two networks at the ends of that channel
The full log of clear packets
might be useful
Generally, if you open up all logs from hermes start
and point out the channel or packet sequence numbers to us we can likely find some interesting things.
Hey @adizere, apologies for the delay just managed to look into this.
clear_on_start
which relates to your issue here: https://github.com/informalsystems/ibc-rs/issues/2462 so didn't manage to get the clear packets log2022-08-01T07:17:30.564645Z WARN ThreadId(41) refresh{client=07-tendermint-57 src_chain=injective-1 dst_chain=core-1}: task encountered ignorable error: error raised while trying to refresh client 07-tendermint-57: failed querying client state on dst chain: RPC error to endpoint https://tm.persistence.injective.network/: serde parse error: expected value at line 1 column 1
I got the serde parse error again but this time there is no reference to a data
field as we show in: https://github.com/informalsystems/ibc-rs/issues/2444 which also makes sense because the RPC in question (from Persistence) has both version and data fields exposed on the ABCI query. Something that might be worth looking at here.
Hope this helps, btw many of our relayers on Injective have mentioned issues with stuck packets whereby they had to take manual action with hermes clear packets
across a number of chains, it's not an isolated issue.
4. I found something interesting in the full logs though:
2022-08-01T07:17:30.564645Z WARN ThreadId(41) refresh{client=07-tendermint-57 src_chain=injective-1 dst_chain=core-1}: task encountered ignorable error: error raised while trying to refresh client 07-tendermint-57: failed querying client state on dst chain: RPC error to endpoint https://tm.persistence.injective.network/: serde parse error: expected value at line 1 column 1
I got the serde parse error again but this time there is no reference to a
data
field as we show in: #2444 which also makes sense because the RPC in question (from Persistence) has both version and data fields exposed on the ABCI query. Something that might be worth looking at here.
This is a different query than in #2444, but rather an ABCI query for the path clients/07-tendermint-57/clientState
, issued in the client refresh worker. If you can figure out from the logs at which height the query was issued, then you can perhaps try to query for this manually?
This seems to work as expected:
❯ curl https://tm.persistence.injective.network -d '{
"jsonrpc": "2.0",
"id": "d18e80f5-e2fc-4381-aa2c-996b97116722",
"method": "abci_query",
"params": {
"path": "store/ibc/key",
"data": "636C69656E74732F30372D74656E6465726D696E742D35372F636C69656E745374617465",
"height": "0",
"prove": false
}
}'
Note 1: the value of
params.data
above is the hex-encoding ofclients/07-tendermint-57/clientState
{
"jsonrpc": "2.0",
"id": "d18e80f5-e2fc-4381-aa2c-996b97116722",
"result": {
"response": {
"code": 0,
"log": "",
"info": "",
"index": "0",
"key": "Y2xpZW50cy8wNy10ZW5kZXJtaW50LTU3L2NsaWVudFN0YXRl",
"value": "CisvaWJjLmxpZ2h0Y2xpZW50cy50ZW5kZXJtaW50LnYxLkNsaWVudFN0YXRlEoMBCgtpbmplY3RpdmUtMRIECAEQAxoECIDqSSIECIDfbioCCCgyADoHCAEQjLG6BkIZCgkIARgBIAEqAQASDAoCAAEQIRgEIAwwAUIZCgkIARgBIAEqAQASDAoCAAEQIBgBIAEwAUoHdXBncmFkZUoQdXBncmFkZWRJQkNTdGF0ZVABWAE=",
"proofOps": null,
"height": "7284159",
"codespace": ""
}
}
}
Same if we specify the height explicitly:
❯ curl https://tm.persistence.injective.network -d '{
"jsonrpc": "2.0",
"id": "d18e80f5-e2fc-4381-aa2c-996b97116722",
"method": "abci_query",
"params": {
"path": "store/ibc/key",
"data": "636C69656E74732F30372D74656E6465726D696E742D35372F636C69656E745374617465",
"height": "7250081",
"prove": false
}
}'
{
"jsonrpc": "2.0",
"id": "d18e80f5-e2fc-4381-aa2c-996b97116722",
"result": {
"response": {
"code": 0,
"log": "",
"info": "",
"index": "0",
"key": "Y2xpZW50cy8wNy10ZW5kZXJtaW50LTU3L2NsaWVudFN0YXRl",
"value": "CisvaWJjLmxpZ2h0Y2xpZW50cy50ZW5kZXJtaW50LnYxLkNsaWVudFN0YXRlEoMBCgtpbmplY3RpdmUtMRIECAEQAxoECIDqSSIECIDfbioCCCgyADoHCAEQjLG6BkIZCgkIARgBIAEqAQASDAoCAAEQIRgEIAwwAUIZCgkIARgBIAEqAQASDAoCAAEQIBgBIAEwAUoHdXBncmFkZUoQdXBncmFkZWRJQkNTdGF0ZVABWAE=",
"proofOps": null,
"height": "7250081",
"codespace": ""
}
}
}
Note 2: here I am using the height before the first NewBlock events found in the logs above:
7250082 - 1 = 7250081
I also tried setting up the injective-1
chain (I just specified the RPC address as `), and
hermes query client state` gives me the expected result:
❯ RUST_LOG=tendermint_rpc=debug hermes query client state --chain injective-1 --client 07-tendermint-57
2022-08-03T15:40:06.639469Z DEBUG ThreadId(01) Incoming response: {
"jsonrpc": "2.0",
"id": "7c28b0ef-f6ef-49f2-935e-a323ade953af",
"result": {
"node_info": {
"protocol_version": {
"p2p": "8",
"block": "11",
"app": "0"
},
"id": "e0e1dd9d329d60c43b642f77dbce7199068fb2fa",
"listen_addr": "tcp://65.0.93.49:26656",
"network": "core-1",
"version": "v0.34.14",
"channels": "40202122233038606100",
"moniker": "node",
"other": {
"tx_index": "on",
"rpc_address": "tcp://0.0.0.0:26657"
}
},
"sync_info": {
"latest_block_hash": "E056FE4E05BC7D1CD5A9FB4F186ABFEA64A326E7B9CC106FDD285474431CF3FD",
"latest_app_hash": "636C3884307740CF094C3042C0F157B82A5F8B9BDDFB3A004BF709F308A6F103",
"latest_block_height": "7284306",
"latest_block_time": "2022-08-03T15:39:58.404070601Z",
"earliest_block_hash": "D9673C1D0DF6ACADCA29501A7EBC0D3FE6FBC7116FF2774D583C6625F07371F5",
"earliest_app_hash": "E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855",
"earliest_block_height": "1",
"earliest_block_time": "2021-03-30T12:08:46.515027422Z",
"catching_up": false
},
"validator_info": {
"address": "059BD993FB36CC110108520614C3569AB913F2B2",
"pub_key": {
"type": "tendermint/PubKeyEd25519",
"value": "lcjeCEqOE4ysLdoT+BWxLLZAOCtwgW10LMaVqgzQCA0="
},
"voting_power": "0"
}
}
}
2022-08-03T15:40:07.259676Z DEBUG ThreadId(15) Incoming response: {
"jsonrpc": "2.0",
"id": "e546034f-d93b-4e14-a818-936405e51f18",
"result": {
"response": {
"code": 0,
"log": "",
"info": "",
"index": "0",
"key": "Y2xpZW50cy8wNy10ZW5kZXJtaW50LTU3L2NsaWVudFN0YXRl",
"value": "CisvaWJjLmxpZ2h0Y2xpZW50cy50ZW5kZXJtaW50LnYxLkNsaWVudFN0YXRlEoMBCgtpbmplY3RpdmUtMRIECAEQAxoECIDqSSIECIDfbioCCCgyADoHCAEQjLG6BkIZCgkIARgBIAEqAQASDAoCAAEQIRgEIAwwAUIZCgkIARgBIAEqAQASDAoCAAEQIBgBIAEwAUoHdXBncmFkZUoQdXBncmFkZWRJQkNTdGF0ZVABWAE=",
"proofOps": null,
"height": "7284306",
"codespace": ""
}
}
}
SUCCESS Tendermint(
ClientState {
chain_id: ChainId {
id: "injective-1",
version: 1,
},
trust_level: TrustThreshold {
numerator: 1,
denominator: 3,
},
trusting_period: 1209600s,
unbonding_period: 1814400s,
max_clock_drift: 40s,
latest_height: Height {
revision: 1,
height: 13539468,
},
proof_specs: ProofSpecs(
[
ProofSpec(
ProofSpec {
leaf_spec: Some(
LeafOp {
hash: Sha256,
prehash_key: NoHash,
prehash_value: Sha256,
length: VarProto,
prefix: [
0,
],
},
),
inner_spec: Some(
InnerSpec {
child_order: [
0,
1,
],
child_size: 33,
min_prefix_length: 4,
max_prefix_length: 12,
empty_child: [],
hash: Sha256,
},
),
max_depth: 0,
min_depth: 0,
},
),
ProofSpec(
ProofSpec {
leaf_spec: Some(
LeafOp {
hash: Sha256,
prehash_key: NoHash,
prehash_value: Sha256,
length: VarProto,
prefix: [
0,
],
},
),
inner_spec: Some(
InnerSpec {
child_order: [
0,
1,
],
child_size: 32,
min_prefix_length: 1,
max_prefix_length: 1,
empty_child: [],
hash: Sha256,
},
),
max_depth: 0,
min_depth: 0,
},
),
],
),
upgrade_path: [
"upgrade",
"upgradedIBCState",
],
allow_update: AllowUpdate {
after_expiry: true,
after_misbehaviour: true,
},
frozen_height: None,
},
)
Note: I set RUST_LOG=tendermint_rpc=debug
in the command above to inspect the raw JSON response of the command.
Closing. Confirmed with Achilleas that after upgrading to v1 the problem is no longer observed.
Summary of Bug
clear_interval
is set to 30 and packets are not being cleared automatically, I have to manually clear packets with thehermes clear packets
command every once in a while.Version
hermes 0.13.0+2375a069
For Admin Use