informalsystems / hermes

IBC Relayer in Rust
https://hermes.informal.systems
Apache License 2.0
442 stars 326 forks source link

Support for disabling health check #1336

Closed adizere closed 3 years ago

adizere commented 3 years ago

Crate

relayer-cli

Summary

A feedback from the Cephalopod ops team is that they would like a method to optionally disable the health checkup that Hermes does indiscriminately.

Problem Definition

Whenever Hermes starts up, before running any command, it runs a health checkup mechanism that involves a few rounds of RPC calls to each chain. This functionality is especially useful for users who have not much experience with using Hermes (because the health check can surface problems that would otherwise be hidden cf https://github.com/informalsystems/ibc-rs/issues/697). The functionality, however, is not very useful for power-users such as relayer operators.

The health checkup involves pulling the genesis file, and some chains (e.g., hub-4) have very large genesis files (~100MB). This means that the health check is a liability slowing down the whole ops process.

Proposal

A new option is necessary to allow disabling the health checkup mechanism. This is necessary in particular for CLIs such as:

Acceptance Criteria

For Admin Use

romac commented 3 years ago

It is unclear which method would be best for disabling the health checkup, either a new config.toml option or a global CLI flag. This can be decided as we go along after consulting with Cephalopod.

I think such commands should just never do the health check (but we can add a command that does just that). This could be achieved by supplying an option to the chain runtime when starting it, but there is no need to expose this in the config file or the CLI options (aside perhaps for the start command).

adizere commented 3 years ago

It is unclear which method would be best for disabling the health checkup, either a new config.toml option or a global CLI flag. This can be decided as we go along after consulting with Cephalopod.

I think such commands should just never do the health check (but we can add a command that does just that). This could be achieved by supplying an option to the chain runtime when starting it, but there is no need to expose this in the config file or the CLI options (aside perhaps for the start command).

Perfect timing for this suggestion, Romain. Mircea and I were reading your idea and concluded it would be the best way to go forward. It seems not even the start command would require the health checkup.

Concrete notes after discussing with Cephalopod:

andynog commented 3 years ago

@adizere and @romac, I will look into this one. Also for the health check for the max_block_bytes I'll try to replace the logic. Using the /genesis endpoint might not be optimal because the genesis information might be huge and might time out when retrieving the information, probably a better RPC to get that information is the /consensus_params endpoint like https://rpc-cosmos.cosmostation.io/consensus_params?height=7511042.

This endpoint might provide a better performance and these params can be changed through governance so it's better to take a value close to the latest block height available to ensure it's accurate. The genesis value might not reflect the current value for that parameter.