informalsystems / hermes

IBC Relayer in Rust
https://hermes.informal.systems
Apache License 2.0
443 stars 327 forks source link

Setup IBC #697

Closed Fraccaman closed 3 years ago

Fraccaman commented 3 years ago

Crate

Version

v0.5.0

Summary

Hermes fails with TxNoConfirmation error constantly when trying to communicate with a full node that does not have indexing enabled.

The error can be tracked down using a patched version of Hermes, and looks like this:

Jun 30 13:39:20.800 TRACE waiting for commit of block(s) Jun 30 13:39:21.001 ERROR query_txs fail: Internal error: transaction indexing is disabled (code: -32603)

Many thanks to @Fraccaman for helping us uncover this corner-case.

A separate, related problem was due to misconfiguration (see https://github.com/heliaxdev/ibc-setup/pull/1).

Acceptance criteria:

Original discussion

Summary of Bug

Probably this is not a bug, but I can't understand whats wrong. I'm trying to open a channel between a cosmos mainnet node and a "own gaia testnet" node. I am able to build the relayer and configure it correctly (It doesn't complain so I'm assuming it is correct) but as soon as I try to run hermes channel handshake id-1 id-2 transfer transfer it throw the following error: {"status":"error","result":"chain runtime/handle error: Light client supervisor error for chain id heliax: empty witness list"}.

p.s: @andynog suggested to open an issue

Version

Steps to Reproduce

I have created a repository with a series of bash scripts to reproduce this use case here.


For Admin Use

romac commented 3 years ago

Something is likely going wrong with the commands that configure the light clients (ie. the light add commands).

I see that the script adds secondary peers for each node using the same network address, is that intended?

Could you remove the &>/dev/null redirect at the end of the commands in ibc.sh and report back with the output?

Fraccaman commented 3 years ago

Yes, you are right, that network address is wrong. So this is the output (i fixed also a typo):

Building the Rust relayer...
Removing light client peers from configuration...
Adding primary peers to light client configuration...
    Finished dev [unoptimized + debuginfo] target(s) in 0.18s
     Running `target/debug/hermes -c /home/ec2-user/.hermes/config.toml light add 'localhost:26657' -c stargate -f -p -s /home/ec2-user/node-stargate/data -y`
     Success Added light client:
  chain id: stargate
  address:  tcp://localhost:26657
  peer id:  648A550A0545AF774223E556117BBDE3156A3520
  height:   5229734
  hash:     CF9104D58D3FE7A35E062C02C09E52EA062C5A8F212DF5DDA358EE9A52450F84
  primary:  true
    Finished dev [unoptimized + debuginfo] target(s) in 0.16s
     Running `target/debug/hermes -c /home/ec2-user/.hermes/config.toml light add 'localhost:26357' -c heliax -f -p -s /home/ec2-user/node-heliax/data -y`
     Success Added light client:
  chain id: heliax
  address:  tcp://localhost:26357
  peer id:  97DAF05D3D5CCE2DF5AC3323784C2C01A1B7D5CA
  height:   6904
  hash:     81995B52593A2F3D1629A31A20CAF034F9AB64A6235D34457C4F36B065FFD439
  primary:  true
Adding secondary peers to light client configuration...
    Finished dev [unoptimized + debuginfo] target(s) in 0.16s
     Running `target/debug/hermes -c /home/ec2-user/.hermes/config.toml light add 'localhost:26657' -c stargate -s /home/ec2-user/node-stargate/data -y --peer-id 2427F8D914A6862279B3326FA64F76E3BC06DB2E`
     Success Added light client:
  chain id: stargate
  address:  tcp://localhost:26657
  peer id:  2427F8D914A6862279B3326FA64F76E3BC06DB2E
  height:   5229737
  hash:     6D874A3D9167B8C955D9AF512C20D7B7069260B831B5F1B338FC77F430AC317E
  primary:  false
    Finished dev [unoptimized + debuginfo] target(s) in 0.16s
     Running `target/debug/hermes -c /home/ec2-user/.hermes/config.toml light add 'localhost:26357' -c heliax -s /home/ec2-user/node-heliax/data -y --peer-id A885BB3D3DFF6101188B462466AE926E7A6CD51E`
     Success Added light client:
  chain id: heliax
  address:  tcp://localhost:26357
  peer id:  A885BB3D3DFF6101188B462466AE926E7A6CD51E
  height:   6905
  hash:     6AED5CE877589AE582C5DF9881F008A7CF371E1DCB0F7F03DD1B81FF5A4E71ED
  primary:  false
Importing keys...
    Finished dev [unoptimized + debuginfo] target(s) in 0.16s
     Running `target/debug/hermes -c /home/ec2-user/.hermes/config.toml keys add stargate /home/ec2-user/node-stargate/key_seed.json`
{"status":"success","result":"Added key node_key (cosmos1ztu56h7kpuguf9y39lhxgayhngmysnxsgl8f9u) on stargate chain"}
    Finished dev [unoptimized + debuginfo] target(s) in 0.16s
     Running `target/debug/hermes -c /home/ec2-user/.hermes/config.toml keys add heliax /home/ec2-user/node-heliax/key_seed.json`
{"status":"success","result":"Added key node_key (cosmos196hkxg7c53h6u75umdrhwum6kp8xyxzw2kvu7r) on heliax chain"}
Done!

Now, the error changed and is the following:

{"status":"error","result":"chain runtime/handle error: Light client instance error for rpc address tcp://localhost:26657: invalid light block: invalid validator set: header_validators_hash=862A9C43A9A29FC6D508352B056A738DB35B3F96F0FA02F0DA2FC1ED8035A55C validators_hash=0198C4156F82C8E0B11C23A24F43FEDE7D92D9146E64FD5D37C5ED3360F53AA9"}
romac commented 3 years ago

Can you post your full ~/.relayer/config.toml file after running the script?

Fraccaman commented 3 years ago

If you mean the ~/.hermes/config.toml here it is:

[global]
timeout = '10s'
strategy = 'naive'
log_level = 'error'

[[chains]]
id = 'stargate'
rpc_addr = 'tcp://localhost:26657'
grpc_addr = 'tcp://localhost:9090'
account_prefix = 'cosmos'
key_name = 'node_key'
store_prefix = 'stargate'
gas = 3000000
clock_drift = '5s'
trusting_period = '14days'

[chains.trust_threshold]
numerator = '1'
denominator = '3'

[chains.peers]
primary = '648A550A0545AF774223E556117BBDE3156A3520'

[[chains.peers.light_clients]]
peer_id = '648A550A0545AF774223E556117BBDE3156A3520'
address = 'tcp://localhost:26657'
timeout = '10s'
trusted_header_hash = 'CF9104D58D3FE7A35E062C02C09E52EA062C5A8F212DF5DDA358EE9A52450F84'
trusted_height = '5229734'

[chains.peers.light_clients.store]
type = 'disk'
path = '/home/ec2-user/node-stargate/data/648A550A0545AF774223E556117BBDE3156A3520'

[[chains.peers.light_clients]]
peer_id = '2427F8D914A6862279B3326FA64F76E3BC06DB2E'
address = 'tcp://localhost:26657'
timeout = '10s'
trusted_header_hash = '6D874A3D9167B8C955D9AF512C20D7B7069260B831B5F1B338FC77F430AC317E'
trusted_height = '5229737'

[chains.peers.light_clients.store]
type = 'disk'
path = '/home/ec2-user/node-stargate/data/2427F8D914A6862279B3326FA64F76E3BC06DB2E'

[[chains]]
id = 'heliax'
rpc_addr = 'tcp://localhost:26357'
grpc_addr = 'tcp://localhost:9091'
account_prefix = 'cosmos'
key_name = 'node_key'
store_prefix = 'heliax'
gas = 3000000
clock_drift = '5s'
trusting_period = '14days'

[chains.trust_threshold]
numerator = '1'
denominator = '3'

[chains.peers]
primary = '97DAF05D3D5CCE2DF5AC3323784C2C01A1B7D5CA'

[[chains.peers.light_clients]]
peer_id = '97DAF05D3D5CCE2DF5AC3323784C2C01A1B7D5CA'
address = 'tcp://localhost:26357'
timeout = '10s'
trusted_header_hash = '81995B52593A2F3D1629A31A20CAF034F9AB64A6235D34457C4F36B065FFD439'
trusted_height = '6904'

[chains.peers.light_clients.store]
type = 'disk'
path = '/home/ec2-user/node-heliax/data/97DAF05D3D5CCE2DF5AC3323784C2C01A1B7D5CA'

[[chains.peers.light_clients]]
peer_id = 'A885BB3D3DFF6101188B462466AE926E7A6CD51E'
address = 'tcp://localhost:26357'
timeout = '10s'
trusted_header_hash = '6AED5CE877589AE582C5DF9881F008A7CF371E1DCB0F7F03DD1B81FF5A4E71ED'
trusted_height = '6905'

[chains.peers.light_clients.store]
type = 'disk'
path = '/home/ec2-user/node-heliax/data/A885BB3D3DFF6101188B462466AE926E7A6CD51E'
romac commented 3 years ago

If you mean the ~/.hermes/config.toml here it is:

Yes it's what I meant, sorry about that! Your config looks good to me.

The light client is throwing this error when verifying the initial trusted lightblock, and gets a mismatch between the hash of validator set stored in the header and the hash of the validator set for that height that it computes.

I am not sure what could cause that. Maybe a mismatch in the Tendermint version that the nodes are running vs the version supported by tendermint-rs? Can you tell me what version of Tendermint the nodes are running?

Fraccaman commented 3 years ago

No worries @romac! Can you tell me how can I check that?

I can give you the output of gaiad version --long (hope this is enough):

name: gaia
server_name: gaiad
version: 4.0.0
commit: a279d091c6f66f8a91c87943139ebaecdd84f689
build_tags: netgo,ledger
go: go version go1.15.8 linux/amd64
build_deps:
- github.com/99designs/keyring@v1.1.6
- github.com/ChainSafe/go-schnorrkel@v0.0.0-20200405005733-88cbf1b4c40d
- github.com/Workiva/go-datastructures@v1.0.52
- github.com/aristanetworks/goarista@v0.0.0-20170210015632-ea17b1a17847
- github.com/armon/go-metrics@v0.3.6
- github.com/beorn7/perks@v1.0.1
- github.com/bgentry/speakeasy@v0.1.0
- github.com/btcsuite/btcd@v0.21.0-beta
- github.com/btcsuite/btcutil@v1.0.2
- github.com/cespare/xxhash/v2@v2.1.1
- github.com/confio/ics23/go@v0.6.3
- github.com/cosmos/cosmos-sdk@v0.41.0
- github.com/cosmos/go-bip39@v1.0.0
- github.com/cosmos/iavl@v0.15.3
- github.com/cosmos/ledger-cosmos-go@v0.11.1
- github.com/cosmos/ledger-go@v0.9.2
- github.com/davecgh/go-spew@v1.1.1
- github.com/dvsekhvalnov/jose2go@v0.0.0-20200901110807-248326c1351b
- github.com/enigmampc/btcutil@v1.0.3-0.20200723161021-e2fb6adb2a25
- github.com/ethereum/go-ethereum@v1.9.25
- github.com/felixge/httpsnoop@v1.0.1
- github.com/fsnotify/fsnotify@v1.4.9
- github.com/go-kit/kit@v0.10.0
- github.com/go-logfmt/logfmt@v0.5.0
- github.com/godbus/dbus@v0.0.0-20190726142602-4481cbc300e2
- github.com/gogo/gateway@v1.1.0
- github.com/gogo/protobuf@v1.3.3 => github.com/regen-network/protobuf@v1.3.3-alpha.regen.1
- github.com/golang/protobuf@v1.4.3
- github.com/golang/snappy@v0.0.3-0.20201103224600-674baa8c7fc3
- github.com/google/btree@v1.0.0
- github.com/gorilla/handlers@v1.5.1
- github.com/gorilla/mux@v1.8.0
- github.com/gorilla/websocket@v1.4.2
- github.com/grpc-ecosystem/go-grpc-middleware@v1.2.2
- github.com/grpc-ecosystem/grpc-gateway@v1.16.0
- github.com/gsterjov/go-libsecret@v0.0.0-20161001094733-a6f4afe4910c
- github.com/gtank/merlin@v0.1.1
- github.com/gtank/ristretto255@v0.1.2
- github.com/hashicorp/go-immutable-radix@v1.0.0
- github.com/hashicorp/golang-lru@v0.5.4
- github.com/hashicorp/hcl@v1.0.0
- github.com/libp2p/go-buffer-pool@v0.0.2
- github.com/magiconair/properties@v1.8.4
- github.com/mattn/go-isatty@v0.0.12
- github.com/matttproud/golang_protobuf_extensions@v1.0.1
- github.com/mimoo/StrobeGo@v0.0.0-20181016162300-f8f6d4d2b643
- github.com/minio/highwayhash@v1.0.1
- github.com/mitchellh/go-homedir@v1.1.0
- github.com/mitchellh/mapstructure@v1.1.2
- github.com/mtibben/percent@v0.2.1
- github.com/pelletier/go-toml@v1.8.0
- github.com/pkg/errors@v0.9.1
- github.com/pmezard/go-difflib@v1.0.0
- github.com/prometheus/client_golang@v1.8.0
- github.com/prometheus/client_model@v0.2.0
- github.com/prometheus/common@v0.15.0
- github.com/prometheus/procfs@v0.2.0
- github.com/rakyll/statik@v0.1.7
- github.com/rcrowley/go-metrics@v0.0.0-20200313005456-10cdbea86bc0
- github.com/regen-network/cosmos-proto@v0.3.1
- github.com/rs/cors@v1.7.0
- github.com/rs/zerolog@v1.20.0
- github.com/spf13/afero@v1.3.4
- github.com/spf13/cast@v1.3.1
- github.com/spf13/cobra@v1.1.1
- github.com/spf13/jwalterweatherman@v1.1.0
- github.com/spf13/pflag@v1.0.5
- github.com/spf13/viper@v1.7.1
- github.com/stretchr/testify@v1.7.0
- github.com/subosito/gotenv@v1.2.0
- github.com/syndtr/goleveldb@v1.0.1-0.20200815110645-5c35d600f0ca
- github.com/tendermint/btcd@v0.1.1
- github.com/tendermint/crypto@v0.0.0-20191022145703-50d29ede1e15
- github.com/tendermint/go-amino@v0.16.0
- github.com/tendermint/tendermint@v0.34.3
- github.com/tendermint/tm-db@v0.6.3
- github.com/zondax/hid@v0.9.0
- golang.org/x/crypto@v0.0.0-20201221181555-eec23a3978ad
- golang.org/x/net@v0.0.0-20201021035429-f5854403a974
- golang.org/x/sys@v0.0.0-20201015000850-e3ed0017c211
- golang.org/x/term@v0.0.0-20201117132131-f5c789dd3221
- golang.org/x/text@v0.3.3
- google.golang.org/genproto@v0.0.0-20210114201628-6edceaf6022f
- google.golang.org/grpc@v1.35.0
- google.golang.org/protobuf@v1.25.0
- gopkg.in/ini.v1@v1.51.0
- gopkg.in/yaml.v2@v2.4.0
- gopkg.in/yaml.v3@v3.0.0-20200313102051-9f266ea9e77c
romac commented 3 years ago

The Tendermint version looks good. We are going to try to reproduce the issue on our side and get back to you. /cc @andynog

andynog commented 3 years ago

Hi @Fraccaman, just following up on this. For the stargate chain, is this a local chain you're running or are you testing against a testnet ?

Fraccaman commented 3 years ago

One node is running stargate mainnet, the other a local chain.

andynog commented 3 years ago

When you say your node is running on Stargate mainnet, are you referring to cosmoshub-4 ? Can you please send what you get if you run this API query https://localhost:26657/status

Just want to ensure we're talking about the same thing :-)

Fraccaman commented 3 years ago

Sorry for the late response, here is the result:

{
  "jsonrpc": "2.0",
  "id": -1,
  "result": {
    "node_info": {
      "protocol_version": {
        "p2p": "8",
        "block": "11",
        "app": "0"
      },
      "id": "3ed8666e8e7fe0ae4dac31014841c1828d240cc9",
      "listen_addr": "tcp://0.0.0.0:26656",
      "network": "cosmoshub-4",
      "version": "",
      "channels": "40202122233038606100",
      "moniker": "heliax-1",
      "other": {
        "tx_index": "on",
        "rpc_address": "tcp://127.0.0.1:26657"
      }
    },
    "sync_info": {
      "latest_block_hash": "E1B2EE22FF1B025DA902FA6B732D09BBCF7DBF8B2E406A7D65A26BED8D499CAF",
      "latest_app_hash": "A99F8C091DD628CA595098368BC57EC69E42EFC1CAAC9D66FA26929756430DD1",
      "latest_block_height": "5201056",
      "latest_block_time": "2021-02-18T13:08:36.985091499Z",
      "earliest_block_hash": "1455A0C15AC49BB506992EC85A3CD4D32367E53A087689815E01A524231C3ADF",
      "earliest_app_hash": "E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855",
      "earliest_block_height": "5200791",
      "earliest_block_time": "2019-12-11T16:11:34Z",
      "catching_up": true
    },
    "validator_info": {
      "address": "84B23508421F2B0C20F1C09308D25F0F6E8AEB41",
      "pub_key": {
        "type": "tendermint/PubKeyEd25519",
        "value": "Abh8gRTD4ZEu5ytdlb+OSkrX7DiRt8vtiPvCLUuQjOY="
      },
      "voting_power": "0"
    }
  }
}
ancazamfir commented 3 years ago

Related to https://github.com/informalsystems/tendermint-rs/issues/831

andynog commented 3 years ago

Hi @Fraccaman, we believe this bug is fixed now. We did encounter a bug and fixed in master. We could reproduce it last week and made modifications in https://github.com/informalsystems/tendermint-rs/issues/831

Please let us know if this is fixed now so we can close the ticket. Thanks!

Fraccaman commented 3 years ago

Ill try again asap and report back! Thanks! :)

andynog commented 3 years ago

Thanks @Fraccaman please keep us posted. :+1:

Fraccaman commented 3 years ago

So, I tried again (same setup). Now I get the following error:

{"status":"error","result":"chain runtime/handle error: Light client instance error for rpc address tcp://localhost:26657: invalid light block: not withing trusting period: expires_at=2021-03-04T15:47:08.9667102Z now=2021-04-07T15:16:02.429079517Z"}
andynog commented 3 years ago

So, I tried again (same setup). Now I get the following error:

{"status":"error","result":"chain runtime/handle error: Light client instance error for rpc address tcp://localhost:26657: invalid light block: not withing trusting period: expires_at=2021-03-04T15:47:08.9667102Z now=2021-04-07T15:16:02.429079517Z"}

Hi @Fraccaman, you might need to update your light client (primary and witness) if you haven't done so. It's the same command to add so run again. This will update the trusted header and height

hermes light add tcp://localhost:26657 -c stargate ...
Fraccaman commented 3 years ago

I started from a new machine, so I don't think I needed to update the trusted headers. Anyway, I tried rerunning hermes again (you can see the scripts here), and its kinda strage. Sometime I get the same error about the trusting period but sometime I get

{"status":"error","result":"chain runtime/handle error: Light client instance error for rpc address tcp://localhost:26657: I/O error: fetched validator set is invalid: proposer with address 'C2356622B495725961B5B201A382DD57CD3305EC' not found in validator set"}
andynog commented 3 years ago

I started from a new machine, so I don't think I needed to update the trusted headers. Anyway, I tried rerunning hermes again (you can see the scripts here), and its kinda strage. Sometime I get the same error about the trusting period but sometime I get

{"status":"error","result":"chain runtime/handle error: Light client instance error for rpc address tcp://localhost:26657: I/O error: fetched validator set is invalid: proposer with address 'C2356622B495725961B5B201A382DD57CD3305EC' not found in validator set"}

That's strange is this setup a clean one?

Fraccaman commented 3 years ago

Yes, clean setup, same set of scripts.

andynog commented 3 years ago

OK, I might have to try to reproduce that error again. But there's a few changes related to light configuration in the next release (should be coming out soon), so I'd rather test when the new release is out.

I started from a new machine, so I don't think I needed to update the trusted headers. Anyway, I tried rerunning hermes again (you can see the scripts here), and its kinda strage. Sometime I get the same error about the trusting period but sometime I get

{"status":"error","result":"chain runtime/handle error: Light client instance error for rpc address tcp://localhost:26657: I/O error: fetched validator set is invalid: proposer with address 'C2356622B495725961B5B201A382DD57CD3305EC' not found in validator set"}

@romac any ideas on what might cause this error ?

romac commented 3 years ago

This error happens when the light client fetches the header and the validator set at height H from the chain, where the latter does not contain a validator whose address matches the proposer_address of the fetched header. It is not clear to me in which circumstances this can happen. As far as I understand, the validator that proposed a block at height H should always be present in the validator set at height H, or at least that's what the code currently enforces.


Fetching the header and validator set:

https://github.com/informalsystems/tendermint-rs/blob/e4eb6b927dd88f89d7bc9016f7e6517bae9b96b9/light-client/src/components/io.rs#L107-L111

Building the validator set:

https://github.com/informalsystems/tendermint-rs/blob/e4eb6b927dd88f89d7bc9016f7e6517bae9b96b9/light-client/src/components/io.rs#L175-L179

Ensuring there is a validator in the set with a matching address:

https://github.com/informalsystems/tendermint-rs/blob/e4eb6b927dd88f89d7bc9016f7e6517bae9b96b9/tendermint/src/validator.rs#L92-L96

ancazamfir commented 3 years ago

What version of hermes? I think this may come from the pagination issue in tendermint rpc (where we were getting an incomplete validator set) that was fixed and picked up in hermes v0.2.0.

romac commented 3 years ago

What version of hermes? I think this may come from the pagination issue in tendermint rpc (where we were getting an incomplete validator set) that was fixed and picked up in hermes v0.2.0.

Oh right, that's probably it! This was fixed in tendermint v0.19.0 and will therefore indeed be fixed in Hermes v0.2.0.

romac commented 3 years ago

@andynog @Fraccaman Can you try with Hermes master and see if the issue does indeed go away?

Fraccaman commented 3 years ago

Im still using 0.1.1 so maybe thats the problem! Yep, I will give it a try!

Fraccaman commented 3 years ago

Uhmm, trying to compile hermes on master fails (I also tried to compile again v0.1.1 and it works). This are some trouble with openssl:

error: failed to run custom build command for `openssl-sys v0.9.61`

Caused by:
  process didn't exit successfully: `/home/ec2-user/ibc-setup/ibc-rs/target/release/build/openssl-sys-3512a973f534ac54/build-script-main` (exit code: 101)
  --- stdout
  cargo:rustc-cfg=const_fn
  cargo:rerun-if-env-changed=X86_64_UNKNOWN_LINUX_GNU_OPENSSL_LIB_DIR
  X86_64_UNKNOWN_LINUX_GNU_OPENSSL_LIB_DIR unset
  cargo:rerun-if-env-changed=OPENSSL_LIB_DIR
  OPENSSL_LIB_DIR unset
  cargo:rerun-if-env-changed=X86_64_UNKNOWN_LINUX_GNU_OPENSSL_INCLUDE_DIR
  X86_64_UNKNOWN_LINUX_GNU_OPENSSL_INCLUDE_DIR unset
  cargo:rerun-if-env-changed=OPENSSL_INCLUDE_DIR
  OPENSSL_INCLUDE_DIR unset
  cargo:rerun-if-env-changed=X86_64_UNKNOWN_LINUX_GNU_OPENSSL_DIR
  X86_64_UNKNOWN_LINUX_GNU_OPENSSL_DIR unset
  cargo:rerun-if-env-changed=OPENSSL_DIR
  OPENSSL_DIR unset
  cargo:rerun-if-env-changed=OPENSSL_NO_PKG_CONFIG
  cargo:rerun-if-env-changed=PKG_CONFIG
  cargo:rerun-if-env-changed=OPENSSL_STATIC
  cargo:rerun-if-env-changed=OPENSSL_DYNAMIC
  cargo:rerun-if-env-changed=PKG_CONFIG_ALL_STATIC
  cargo:rerun-if-env-changed=PKG_CONFIG_ALL_DYNAMIC
  cargo:rerun-if-env-changed=PKG_CONFIG_PATH_x86_64-unknown-linux-gnu
  cargo:rerun-if-env-changed=PKG_CONFIG_PATH_x86_64_unknown_linux_gnu
  cargo:rerun-if-env-changed=HOST_PKG_CONFIG_PATH
  cargo:rerun-if-env-changed=PKG_CONFIG_PATH
  cargo:rerun-if-env-changed=PKG_CONFIG_LIBDIR_x86_64-unknown-linux-gnu
  cargo:rerun-if-env-changed=PKG_CONFIG_LIBDIR_x86_64_unknown_linux_gnu
  cargo:rerun-if-env-changed=HOST_PKG_CONFIG_LIBDIR
  cargo:rerun-if-env-changed=PKG_CONFIG_LIBDIR
  cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR_x86_64-unknown-linux-gnu
  cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR_x86_64_unknown_linux_gnu
  cargo:rerun-if-env-changed=HOST_PKG_CONFIG_SYSROOT_DIR
  cargo:rerun-if-env-changed=PKG_CONFIG_SYSROOT_DIR
  run pkg_config fail: "`\"pkg-config\" \"--libs\" \"--cflags\" \"openssl\"` did not exit successfully: exit code: 1\n--- stderr\nPackage openssl was not found in the pkg-config search path.\nPerhaps you should add the directory containing `openssl.pc\'\nto the PKG_CONFIG_PATH environment variable\nNo package \'openssl\' found\n"

  --- stderr
  thread 'main' panicked at '

  Could not find directory of OpenSSL installation, and this `-sys` crate cannot
  proceed without this knowledge. If OpenSSL is installed and this crate had
  trouble finding it,  you can set the `OPENSSL_DIR` environment variable for the
  compilation process.

  Make sure you also have the development packages of openssl installed.
  For example, `libssl-dev` on Ubuntu or `openssl-devel` on Fedora.

  If you're in a situation where you think the directory *should* be found
  automatically, please open a bug at https://github.com/sfackler/rust-openssl
  and include information about your system as well as this message.

  $HOST = x86_64-unknown-linux-gnu
  $TARGET = x86_64-unknown-linux-gnu
  openssl-sys = 0.9.61

  ', /home/ec2-user/.cargo/registry/src/github.com-1ecc6299db9ec823/openssl-sys-0.9.61/build/find_normal.rs:174:5
  note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
warning: build failed, waiting for other jobs to finish...

Do you have any thoughts on this? The machine I'm using is running Amazon-linux as OS.

Update: I think its a problem with my openssl configuration but I'm not really sure. I have solved this adding to the ibc-relayer-cli create the openssl dependency (https://docs.rs/crate/openssl-sys/0.9.36). Probably not the best option but it compiles.

Fraccaman commented 3 years ago

So, I'm using the hermer binary from master but I have problems running the same script as before. It is now complaining about missing subcommands. For example:

cargo run --bin hermes -- -c ~/.hermes/config.toml light rm -c stargate --all -

return

error: unrecognized command `light`

Did you change anything in the command line?

thanethomson commented 3 years ago

Do you have any thoughts on this? The machine I'm using is running Amazon-linux as OS.

Update: I think its a problem with my openssl configuration but I'm not really sure. I have solved this adding to the ibc-relayer-cli create the openssl dependency (https://docs.rs/crate/openssl-sys/0.9.36). Probably not the best option but it compiles.

To build the tendermint-rpc crate with TLS support (one of the ibc-rs dependencies), you need to ensure you have the OpenSSL development library installed for your platform. See https://docs.rs/openssl/0.10.33/openssl/index.html#automatic

romac commented 3 years ago

So, I'm using the hermer binary from master but I have problems running the same script as before. It is now complaining about missing subcommands. For example:

My bad, we just merged a PR which removes the need to specify peers for the light client, so we can't test whether the fix works for you via the (now removed) light add command. You can therefore remove the whole [peers] section from your configuration file as well as the invocation of the light rm and light add commands in your script. You may have to manually update your configuration file as some required options have been added in the meantime. You can take a look at the example config to see which options must now be specified. The new ones are: websocket_addr, rpc_timeout (optional), fee_denom and fee_amount.

The next best way to test if things now work correctly would be to create a on-chain client and perform a client update (which will trigger light client verification), by following the instructions at https://hermes.informal.systems/tx_client.html.

Fraccaman commented 3 years ago

Hi @romac! I tried updating the scripts (https://github.com/heliaxdev/ibc-setup) but the link you posted above is a 404 (probably you updated the docs). I followed the docs at https://hermes.informal.systems/tutorials/local-chains/raw/index.html.

Do you mind checking the config and scripts in the ibc folder? I will try to run it, but it takes a lot of time to start gaiad on cosmoshub-4 network. Thank you!

Fraccaman commented 3 years ago

Update. Tried running it again and now it complains as soon as I launch hermes tx raw create-client $IBC0 $IBC1 to create the first client. The error is the following:

Error: tx error: error raised while creating client: failed while querying src chain (stargate) for latest height: Light client error for RPC address http://localhost:26657/: node at http://localhost:26657/ running chain stargate not caught up.

I think i just need to wait for the node to get in sync (?)

ancazamfir commented 3 years ago

yes, it looks like that. Has the sync finished? What version of hermes are you running?

Fraccaman commented 3 years ago

No, it still catching up. Do you have any idea how much time/space this process should take?

ancazamfir commented 3 years ago

What is the output of http://localhost:26657/status?

Fraccaman commented 3 years ago

Here it is:

{
  "jsonrpc": "2.0",
  "id": -1,
  "result": {
    "node_info": {
      "protocol_version": {
        "p2p": "8",
        "block": "11",
        "app": "0"
      },
      "id": "729d39e3146fe9871b33c2d88eb0030c38995805",
      "listen_addr": "tcp://0.0.0.0:26656",
      "network": "cosmoshub-4",
      "version": "v0.34.9",
      "channels": "40202122233038606100",
      "moniker": "heliax-1",
      "other": {
        "tx_index": "on",
        "rpc_address": "tcp://127.0.0.1:26657"
      }
    },
    "sync_info": {
      "latest_block_hash": "F00C8384734FE5FE70A33581C9CCF1770E087DAE8197DEC1DB2D08AF40E826CD",
      "latest_app_hash": "20F4CAB46E5A81D03D2D39EB3073AD8A834BA7B0CBA02288E085D872BD3ADD21",
      "latest_block_height": "5978609",
      "latest_block_time": "2021-04-24T14:10:05.321924709Z",
      "earliest_block_hash": "1455A0C15AC49BB506992EC85A3CD4D32367E53A087689815E01A524231C3ADF",
      "earliest_app_hash": "E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649B934CA495991B7852B855",
      "earliest_block_height": "5200791",
      "earliest_block_time": "2019-12-11T16:11:34Z",
      "catching_up": true
    },
    "validator_info": {
      "address": "49587EBF98D03A58115682BEA6B0D8CB585EA4F7",
      "pub_key": {
        "type": "tendermint/PubKeyEd25519",
        "value": "qwmnJLq4afdHSSkAM24hmuQ0j+HjiQJEiXQ8lF81Uys="
      },
      "voting_power": "0"
    }
  }
}

Yes, it is still catching up. It is using 300+gb of space. Do you know any configuration settings to prune old blocks?

andynog commented 3 years ago

Hi @Fraccaman,

I have a full node synced and it's taking 337GB as of now (height 6037600) this is with pruning = nothing.

Concerning custom pruning if you do gaiad start help it gives you some pruning options, you can set in start command or in the config.

  --pruning string                                  Pruning strategy (default|nothing|everything|custom) (default "default")
      --pruning-interval uint                           Height interval at which pruned heights are removed from disk (ignored if pruning is not 'custom')
      --pruning-keep-every uint                         Offset heights to keep on disk after 'keep-every' (ignored if pruning is not 'custom')
      --pruning-keep-recent uint                        Number of recent heights to keep on disk (ignored if pruning is not 'custom')

if you're running this node on AWS, one thing I noticed when syncing mine was that instance type and size and volume speed makes a big difference.

ancazamfir commented 3 years ago

Yes, it is still catching up

at 8-10 blocks/ sec sync speed this will take another ~20 hrs probably.

Fraccaman commented 3 years ago

if you're running this node on AWS, one thing I noticed when syncing mine was that instance type and size and volume speed makes a big difference.

Yes, im running on a EC2 t3.xlarge machine (which may not be the best instance). I'll try with a non-burstable machine next time. @andynog Do you know if IBC works also with a pruned node? @ancazamfir With this machine I'm doing 2/3 blocks/ sec. Should be ready by tomorrow.

Fraccaman commented 3 years ago

Tried running hermes tx raw create-client $IBC0 $IBC1 and hermes tx raw create-client $IBC0 $IBC1 but they both return the same error:

[ec2-user]$ hermes tx raw create-client $IBC0 $IBC1
Error: tx error: error raised while creating client: failed sending message to dst chain (heliax) with err: GRPC error: GRPC error: status: NotFound, message: "rpc error: code = NotFound desc = account cosmos12dpq5dy339t4xgx064rx8cspzswxjzcn7rgy0k not found: key not found", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"}
[ec2-user]$ hermes tx raw create-client $IBC1 $IBC0
Error: tx error: error raised while creating client: failed sending message to dst chain (stargate) with err: GRPC error: GRPC error: status: NotFound, message: "rpc error: code = NotFound desc = account cosmos1dnsgp73u4tfjq656r24y9xphhtj6wyv3pjq50x not found: key not found", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc"} }

Any idea?

andynog commented 3 years ago

@Fraccaman this account cosmos1dnsgp73u4tfjq656r24y9xphhtj6wyv3pjq50x need to have some balance on it. If this is just a new account you will have to load some tokens on it to be able to pay for the transactions. This account only 'exists' on-chain when you do a send transaction sending tokens to it.

Also, I'm assuming you did the hermes keys add command

andynog commented 3 years ago

Hi @Fraccaman, could you figure out the keys and account issues?

Fraccaman commented 3 years ago

Sorry, I have been a little busy, ill try next week!

andynog commented 3 years ago

Hi @Fraccaman, any updates on this? We believe this might not be an issue anymore since several changes have been implemented on the relayer since then. We might close this issue for now if we don't hear back in the next few days, but feel free to open it again if the problem persists.

Fraccaman commented 3 years ago

@andynog sorry for the delay, some things have changed and I had to resync the chain which is taking a lot (1week+). Im almost finished synching, maybe a couple of days, and ill be back with some updates.

Fraccaman commented 3 years ago

Im still synching the chain. Im at height 6106333 but its really slow (something like 1block/s).

Fraccaman commented 3 years ago

Okay made some progress. I have been able to download a snapshot from https://cosmos.quicksync.io/ which speeds up things a lot. So, as I have both node up and running, addresscosmos1x60z62swcm4j5ct4l5xpmcln3fdkvp8kylqv07 has some ATOM, and I'm now trying to create a channel between them.

Hermes version:

[ec2-user@ip-172-31-19-38 ibc-setup]$ $BINARY version
hermes 0.4.0

Gaiad version:

[ec2-user@ip-172-31-19-38 ibc-setup]$ gaiad version
v4.2.1
Fraccaman commented 3 years ago

Okay, I had some bugs in the hermes config. Ill keep you posted.

Fraccaman commented 3 years ago

More updates. I have been able to create a client between the 2 chains 🎉🎉🎉 Here the output:

[ec2-user@ip-172-31-19-38 ibc-setup]$ $BINARY query client state $IBC1 07-tendermint-252
Success: ClientState {
    chain_id: ChainId {
        id: "h3liax",
        version: 0,
    },
    trust_level: TrustThresholdFraction {
        numerator: 1,
        denominator: 3,
    },
    trusting_period: 1209600s,
    unbonding_period: 1814400s,
    max_clock_drift: 5s,
    frozen_height: Height {
        revision: 0,
        height: 0,
    },
    latest_height: Height {
        revision: 0,
        height: 3437,
    },
    upgrade_path: [
        "upgrade",
        "upgradedIBCState",
    ],
    allow_update: AllowUpdate {
        after_expiry: false,
        after_misbehaviour: false,
    },
}
[ec2-user@ip-172-31-19-38 ibc-setup]$ $BINARY query client state $IBC0 07-tendermint-2
Success: ClientState {
    chain_id: ChainId {
        id: "cosmoshub-4",
        version: 4,
    },
    trust_level: TrustThresholdFraction {
        numerator: 1,
        denominator: 3,
    },
    trusting_period: 1209600s,
    unbonding_period: 1814400s,
    max_clock_drift: 5s,
    frozen_height: Height {
        revision: 0,
        height: 0,
    },
    latest_height: Height {
        revision: 4,
        height: 6614024,
    },
    upgrade_path: [
        "upgrade",
        "upgradedIBCState",
    ],
    allow_update: AllowUpdate {
        after_expiry: false,
        after_misbehaviour: false,
    },
}

Following the tutorial, I'm trying to create a connection and this one is failing. The first command (conn-init) works:

$BINARY tx raw conn-init $IBC0 $IBC1 07-tendermint-2 07-tendermint-252
Success: OpenInitConnection(
    OpenInit(
        Attributes {
            height: Height {
                revision: 0,
                height: 4474,
            },
            connection_id: Some(
                ConnectionId(
                    "connection-0",
                ),
            ),
            client_id: ClientId(
                "07-tendermint-2",
            ),
            counterparty_connection_id: None,
            counterparty_client_id: ClientId(
                "07-tendermint-252",
            ),
        },
    ),
)

Second command (conn-try) should be working:

$BINARY tx raw conn-try $IBC1 $IBC0 07-tendermint-252 07-tendermint-2 -s connection-0
Error: tx error: failed during a transaction submission step to chain id cosmoshub-4 with underlying error: RPC error to endpoint http://localhost:26657/: RPC error to endpoint http://localhost:26657/: Internal error: timed out waiting for tx to be included in a block (code: -32603)

I think it ran successfully, but I need to increase the rpc timeout. Third command (conn-ack) fails:

$BINARY tx raw conn-ack $IBC0 $IBC1 07-tendermint-2 07-tendermint-252 -d connection-0 -s connection-1
Error: tx error: failed with underlying cause: tx response error: deliver_tx reports error: log=Log("failed to execute message; message index: 1: connection handshake open ack failed: failed connection state verification for client (07-tendermint-2): chained membership proof failed to verify membership of value: 0A1130372D74656E6465726D696E742D32353212230A0131120D4F524445525F4F524445524544120F4F524445525F554E4F524445524544180222260A0F30372D74656E6465726D696E742D32120C636F6E6E656374696F6E2D301A050A03696263 in subroot C42B7ED48FB47D14FC71C73C510D1687C981E1827F52DDCDC80FED215C6674BA at index 0. Please ensure the path and value are both correct.: invalid proof")

Do you have any suggestions?

P.s: every command dealing with cosmoshub-4 chain "fails" with the same timeout error.

Fraccaman commented 3 years ago

I made a little progress but I'm still stuck at the same "step". I avoided the timeout error by setting in gaia timeout_broadcast_tx_commit to 300s. The error comes from the conn-try command(and not the conn-ack) and is the following:

Error: tx error: failed with underlying cause: tx response error: deliver_tx reports error: log=Log("failed to execute message; message index: 1: connection handshake open try failed: failed connection state verification for client (07-tendermint-260): chained membership proof failed to verify membership of value: 0A0F30372D74656E6465726D696E742D3012230A0131120D4F524445525F4F524445524544120F4F524445525F554E4F5244455245441801221A0A1130372D74656E6465726D696E742D3236301A050A03696263 in subroot 3A9C01E534577A1D3BD6AD66742C7A1CFE16610347EB7F677FD636EFD9D0C50F at index 0. Please ensure the path and value are both correct.: invalid proof")

You can see the transactions here.