informalsystems / hermes

IBC Relayer in Rust
https://hermes.informal.systems
Apache License 2.0
440 stars 326 forks source link

One send packet stuck on Osmosis channel `channel-0` to Cosmos-Hub #2993

Closed adizere closed 1 year ago

adizere commented 1 year ago

Summary of Bug

@faddat discovered this SendPacket stuck last week.

Version

v1.2

Steps to Reproduce

hermes clear packets --chain osmosis-1 --port transfer --channel channel-0
2023-01-09T11:24:50.964255Z  INFO ThreadId(01) using default configuration from '/Users/faddat/.hermes/config.toml'
2023-01-09T11:24:50.964627Z  INFO ThreadId(01) running Hermes v1.2.0+00fe26ed
2023-01-09T11:25:04.664258Z  INFO ThreadId(01) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}: 2 unreceived packets found: [ 1275507, 1642799 ]
2023-01-09T11:25:04.664347Z  WARN ThreadId(01) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:query_send_packet_events{chain=osmosis-1 height=<=1-7722248 sequences=[Sequence(1275507), Sequence(1642799)]}: will do QueryPacketEventDataRequest from src_chain
2023-01-09T11:25:04.664421Z  WARN ThreadId(11) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:query_send_packet_events{chain=osmosis-1 height=<=1-7722248 sequences=[Sequence(1275507), Sequence(1642799)]}: query_packets_from_txs
2023-01-09T11:25:04.664458Z  WARN ThreadId(11) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:query_send_packet_events{chain=osmosis-1 height=<=1-7722248 sequences=[Sequence(1275507), Sequence(1642799)]}: /tx_search for request QueryPacketEventDataRequest { event_id: SendPacket, source_channel_id: ChannelId("channel-0"), source_port_id: PortId("transfer"), destination_channel_id: ChannelId("channel-141"), destination_port_id: PortId("transfer"), sequences: [Sequence(1275507), Sequence(1642799)], height: SmallerEqual(Specific(Height { revision: 1, height: 7722248 })) }
2023-01-09T11:25:18.475936Z  WARN ThreadId(01) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}: encountered query failure while pulling packet data: RPC error to endpoint https://rpc-osmosis-archive-ia.cosmosia.notional.ventures/: serde parse error: EOF while parsing a string at line 188 column 524468
SUCCESS []

The main part is serde parse error: EOF while parsing a string at line 188 column 524468.

2023-01-09T14:30:00.924712Z TRACE ThreadId(11) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:query_send_packet_events{chain=osmosis-1 height=<=1-7724204 sequences=[Sequence(1275507)]}: end_block_events []
2023-01-09T14:30:00.926956Z  INFO ThreadId(01) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}: pulled packet data for 1 events out of 1 sequences: 1275507..=1275507; events.total=1 events.left=0
2023-01-09T14:30:01.395287Z TRACE ThreadId(01) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:generate_operational_data{tracking_id=packet-recv}: processing event event=SendPacket(SendPacket { packet: seq:1275507, path:channel-0/transfer->channel-141/transfer, toh:no timeout, tos:Timestamp(2022-09-08T22:35:35Z)) }) at height 1-7724204
2023-01-09T14:30:03.331731Z TRACE ThreadId(01) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:generate_operational_data{tracking_id=packet-recv}: build timeout for channel packet=seq:1275507, path:channel-0/transfer->channel-141/transfer, toh:no timeout, tos:Timestamp(2022-09-08T22:35:35Z)) height=4-13599968
2023-01-09T14:30:03.879642Z TRACE ThreadId(01) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:generate_operational_data{tracking_id=packet-recv}: built timeout msg packet=seq:1275507, path:channel-0/transfer->channel-141/transfer, toh:no timeout, tos:Timestamp(2022-09-08T22:35:35Z)) height=4-13599969
2023-01-09T14:30:03.880406Z TRACE ThreadId(01) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:generate_operational_data{tracking_id=packet-recv}: collected event msg.type_url=/ibc.core.channel.v1.MsgTimeout event=SendPacket(SendPacket { packet: seq:1275507, path:channel-0/transfer->channel-141/transfer, toh:no timeout, tos:Timestamp(2022-09-08T22:35:35Z)) }) at height 1-7724204
2023-01-09T14:30:03.880676Z DEBUG ThreadId(01) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:schedule{odata=packet-recv ->Source @4-13599968; len=1}: connection delay need not be taken into account: client update message will be prepended later
2023-01-09T14:30:03.880738Z DEBUG ThreadId(01) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:relay{odata=packet-recv ->Source @4-13599968; len=1}: retrying retry.current=1 retry.max=5
2023-01-09T14:30:03.880996Z DEBUG ThreadId(01) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:relay{odata=packet-recv ->Source @4-13599968; len=1}: prepending Source client update at height 4-13599969
2023-01-09T14:30:06.736708Z TRACE ThreadId(21) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:relay{odata=packet-recv ->Source @4-13599968; len=1}:foreign_client.wait_and_build_update_client_with_trusted{client=cosmoshub-4->osmosis-1:07-tendermint-1 target_height=4-13599969}:foreign_client.build_update_client_with_trusted{client=cosmoshub-4->osmosis-1:07-tendermint-1 target_height=4-13599969}: light client verification trusted=4-13599963 target=4-13599969
2023-01-09T14:30:21.291374Z TRACE ThreadId(21) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:relay{odata=packet-recv ->Source @4-13599968; len=1}:foreign_client.wait_and_build_update_client_with_trusted{client=cosmoshub-4->osmosis-1:07-tendermint-1 target_height=4-13599969}:foreign_client.build_update_client_with_trusted{client=cosmoshub-4->osmosis-1:07-tendermint-1 target_height=4-13599969}: adjusting headers with 0 supporting headers trusted=4-13599963 target=13599969
2023-01-09T14:30:21.291431Z TRACE ThreadId(21) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:relay{odata=packet-recv ->Source @4-13599968; len=1}:foreign_client.wait_and_build_update_client_with_trusted{client=cosmoshub-4->osmosis-1:07-tendermint-1 target_height=4-13599969}:foreign_client.build_update_client_with_trusted{client=cosmoshub-4->osmosis-1:07-tendermint-1 target_height=4-13599969}: fetching header height=4-13599964
2023-01-09T14:30:28.609044Z DEBUG ThreadId(01) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:relay{odata=packet-recv ->Source @4-13599968; len=1}:foreign_client.wait_and_build_update_client_with_trusted{client=cosmoshub-4->osmosis-1:07-tendermint-1 target_height=4-13599969}:foreign_client.build_update_client_with_trusted{client=cosmoshub-4->osmosis-1:07-tendermint-1 target_height=4-13599969}: building a MsgUpdateAnyClient from trusted height 4-13599963 to target height 4-13599969
2023-01-09T14:30:28.611898Z  INFO ThreadId(01) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:relay{odata=packet-recv ->Source @4-13599968; len=1}: assembled batch of 2 message(s)
2023-01-09T14:30:29.582360Z DEBUG ThreadId(11) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:relay{odata=packet-recv ->Source @4-13599968; len=1}:send_messages_and_wait_commit{chain=osmosis-1 tracking_id=packet-recv}: sending 2 messages as 2 batches to chain osmosis-1 in parallel
2023-01-09T14:30:29.582596Z DEBUG ThreadId(11) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:relay{odata=packet-recv ->Source @4-13599968; len=1}:send_messages_and_wait_commit{chain=osmosis-1 tracking_id=packet-recv}:send_tx_with_account_sequence_retry{chain=osmosis-1 account.sequence=8}: max fee, for use in tx simulation: Fee { amount: "7501uosmo", gas_limit: 3000000 }
2023-01-09T14:30:31.904180Z DEBUG ThreadId(11) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:relay{odata=packet-recv ->Source @4-13599968; len=1}:send_messages_and_wait_commit{chain=osmosis-1 tracking_id=packet-recv}:send_tx_with_account_sequence_retry{chain=osmosis-1 account.sequence=8}:estimate_gas: tx simulation successful, gas amount used: 520049
2023-01-09T14:30:31.904324Z DEBUG ThreadId(11) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:relay{odata=packet-recv ->Source @4-13599968; len=1}:send_messages_and_wait_commit{chain=osmosis-1 tracking_id=packet-recv}:send_tx_with_account_sequence_retry{chain=osmosis-1 account.sequence=8}: send_tx: using 520049 gas, fee Fee { amount: "1431uosmo", gas_limit: 572053 } id=osmosis-1
2023-01-09T14:30:34.297676Z DEBUG ThreadId(11) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:relay{odata=packet-recv ->Source @4-13599968; len=1}:send_messages_and_wait_commit{chain=osmosis-1 tracking_id=packet-recv}:send_tx_with_account_sequence_retry{chain=osmosis-1 account.sequence=8}: gas estimation succeeded
2023-01-09T14:30:34.297749Z DEBUG ThreadId(11) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:relay{odata=packet-recv ->Source @4-13599968; len=1}:send_messages_and_wait_commit{chain=osmosis-1 tracking_id=packet-recv}:send_tx_with_account_sequence_retry{chain=osmosis-1 account.sequence=8}: tx was successfully broadcasted, increasing account sequence number response=Response { code: Ok, data: b"", log: "[]", hash: Hash::Sha256(6D81E1214BA130B3B232AAD8C6F6ED1B551B46E6F5A42FF5ABF4E6A6BF91AC75) } account.sequence.old=8 account.sequence.new=9
2023-01-09T14:30:34.299232Z DEBUG ThreadId(11) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:relay{odata=packet-recv ->Source @4-13599968; len=1}:send_messages_and_wait_commit{chain=osmosis-1 tracking_id=packet-recv}:send_tx_with_account_sequence_retry{chain=osmosis-1 account.sequence=9}: max fee, for use in tx simulation: Fee { amount: "7501uosmo", gas_limit: 3000000 }
2023-01-09T14:30:39.801049Z ERROR ThreadId(11) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:relay{odata=packet-recv ->Source @4-13599968; len=1}:send_messages_and_wait_commit{chain=osmosis-1 tracking_id=packet-recv}:send_tx_with_account_sequence_retry{chain=osmosis-1 account.sequence=9}:estimate_gas: failed to simulate tx. propagating error to caller: gRPC call failed with status: status: Internal, message: "protocol error: received message with invalid compression flag: 60 (valid flags are 0 and 1) while receiving response with status: 413 Payload Too Large", details: [], metadata: MetadataMap { headers: {"server": "nginx/1.22.0", "date": "Mon, 09 Jan 2023 14:30:39 GMT", "content-type": "text/html", "content-length": "183"} }
2023-01-09T14:30:39.801182Z ERROR ThreadId(11) relay_recv_packet_and_timeout_messages{src_chain=osmosis-1 src_port=transfer src_channel=channel-0 dst_chain=cosmoshub-4}:relay{odata=packet-recv ->Source @4-13599968; len=1}:send_messages_and_wait_commit{chain=osmosis-1 tracking_id=packet-recv}:send_tx_with_account_sequence_retry{chain=osmosis-1 account.sequence=9}: gas estimation failed or encountered another unrecoverable error error=gRPC call failed with status: status: Internal, message: "protocol error: received message with invalid compression flag: 60 (valid flags are 0 and 1) while receiving response with status: 413 Payload Too Large", details: [], metadata: MetadataMap { headers: {"server": "nginx/1.22.0", "date": "Mon, 09 Jan 2023 14:30:39 GMT", "content-type": "text/html", "content-length": "183"} }
ERROR link error: link failed with underlying error: gRPC call failed with status: status: Internal, message: "protocol error: received message with invalid compression flag: 60 (valid flags are 0 and 1) while receiving response with status: 413 Payload Too Large", details: [], metadata: MetadataMap { headers: {"server": "nginx/1.22.0", "date": "Mon, 09 Jan 2023 14:30:39 GMT", "content-type": "text/html", "content-length": "183"} }

The main part here is HTTP error 413, likely because of nginx frontend to Notional RPC nodes.

Acceptance Criteria

For Admin Use

adizere commented 1 year ago

Working on this is blocked on an available node that has the packet data for the stuck packet. Right now, we're using a Notional archive node and working in tandem with Jacob to debug.

doggystylez commented 1 year ago

cleared with hermes 1.1.0 and a whole lot of gas

romac commented 1 year ago

@doggystylez Sweet! Did it just go through with clear packets or were there additional steps to be taken?

adizere commented 1 year ago

Way to go @doggystylez!

We'll coordinate with the IBC-go team & SDK to see where's the best place for implementing preventive measures. A proof-of-concept solution for preventing these kind of corner-cases in the future exists here: https://github.com/cosmos/ibc-go/pull/2983.

doggystylez commented 1 year ago

@romac clear packets worked

what I did:

I noted that the osmosis app.toml has a setting for grpc max-recv-msg-size (which conveniently defaults to 10MB in the latest version), going to check this out further