hyperlane-xyz / hyperlane-monorepo

The home for Hyperlane core contracts, sdk packages, and other infrastructure
https://hyperlane.xyz
Other
337 stars 369 forks source link

Investigate ways to improve config loading errors #1593

Closed mattiekat closed 1 year ago

mattiekat commented 1 year ago

Right now we get very useless errors when a config value is considered "invalid". The goal of this ticket is to find out how much of a lift it would be to change the config library to track more useful debug information such as the path to an error in the config itself and (if it received an invalid value) where that value came from.

This is becoming a larger issue now that we have other people running the agents as it makes it very hard for them to figure out what they are doing wrong and makes the experience of hosting Hyperlane frustrating.

One thing we are doing right now that is not helping is we parse integers, addresses, and other types after config has already been loaded as strings, so we don't have the full information about where it came from. We might need to store some metadata to be able to report what went wrong or find a way to extend config to support these other serializable types (which I presume it does not now as we have avoided using it to do so).

nambrot commented 1 year ago

Looking at this ticket again, I would definitely constrain the scope of this ticket towards the "completion of config loading", not for the parsing of things from strings separately (if I understand correctly).

For example, my PR here https://github.com/hyperlane-xyz/hyperlane-monorepo/pull/1591 at least spits things out which is probably is still impossible to debug for the layman, but does help me quite a lot in debugging. Any easy wins that can point out the path in the tree and maybe show the path itself would be great.

Or maybe a different binary that spits out the config, back serialized as a json object in a file (i.e. the config files + env vars merged) so that can be easily inspected

nambrot commented 1 year ago

Or maybe there is a way to add strings to the structs in a "macro" (i'm sure i'm using the wrong rust term here), that could be printed out when config deserialization fails so that we can easily get more "human" error messages that point out what is missing or wrong?

mattiekat commented 1 year ago

Investigation

Wanted to start off by playing around a bit to get a feel for what errors are hard to understand and what errors are good...

Incorrect domain id (but still an integer)

Error: Loading chain alfajores

Caused by:
   0: Loading chain alfajores
   1: Chain name implies a different domain than the domain id provided; the config is probably wrong.

Location:
    hyperlane-base/src/settings/chains.rs:278:26

Incorrect domain id (number instead of integer in a string)

...full config printout omitted
Error: invalid type: integer `44787`, expected a string

Location:
    /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/convert/mod.rs:726:9

Incorrect mailbox address string (too short)

Error: Loading chain alfajores

Caused by:
   0: Loading chain alfajores
   1: Invalid ethereum address
   2: Invalid input length

Number instead of string

...full config printout omitted
Error: invalid type: integer `923874`, expected a string

Location:
    /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/convert/mod.rs:726:9

Invalid chain name

...full config printout omitted
Error: missing field `name`

Location:
    /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/convert/mod.rs:726:9

Misspelled "protocol" in a chain config

...full config printout omitted
Error: missing field `protocol`

Location:
    /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/convert/mod.rs:726:9

Misspelled "chains" in the root

...full config printout omitted
Error: missing field `name`

Location:
    /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/convert/mod.rs:726:9

Discoveries

From early looking, there is a clear divide of two fundamental errors (as was alluded to previously).

Right now the errors after the config has been parsed are better and include more hints but lack a path to the origin config so it is hard to know for sure where the value came from especially if it is overridden somewhere.

The errors during config parsing are the worst and just have the really generic "Error missing field" or "Error invalid type" displayed.

Thoughts

We should probably work on getting some better context info for some of the parsing, such as for the invalid mailbox address, saying it was the mailbox specifically that was wrong.

Primary focus should be on seeing if there is anyway we can reasonably fix the Config specific errors that are generated by deserialization itself.

Ideally in all cases we would be able to display the source of the given config value. I.e. what file or env var set it and the path to it within a config file.

nambrot commented 1 year ago

Primary focus should be on seeing if there is anyway we can reasonably fix the Config specific errors that are generated by deserialization itself.

What exactly do you mean by this?

mattiekat commented 1 year ago

What exactly do you mean by this?

Errors like

...full config printout omitted
Error: missing field `name`

Location:
    /rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/convert/mod.rs:726:9

Would ideally print something like

Error: missing field `name` at `chains.alfajores.name`
mattiekat commented 1 year ago

Spent some time digging into the ConfigError type hoping that maybe we could destructure it and get the info out that we need but it actually is not showing up as type error or not found errors even when I trigger what I would expect to cause them which means we don't have access to that extra info.

Found https://github.com/mehcode/config-rs/issues/83 which suggested serde_path_to_error which was not working for us. Perhaps support for it broke? We kept getting an empty path object. I am guessing this is a result of something about how the config library is merging files and the people it worked for in the ticket may have been using a more simple case. Went ahead and posted a comment with the hope someone might know what is going on.

My next thought was to go looking through their codebase to see what the possibilities are for making a forked version that includes more path data as it parses and I think the answer is it would be very hard because they are relying very heavily on serde in a way that I don't think would be easy to add that to.

I want to sleep on this thought, but custom config deserialization might not be too hard using serde + the rust built-in support for envs (https://doc.rust-lang.org/std/env/fn.vars.html). See the thing is we don't need a fully custom implementation and we only care about support JSON and ENV overrides, so I think it would not be too hard to do this in a way that looks to the outside observer the same as how we do it now. We would be able to give much better errors on where the configured value was defined (so you know if it was overiden in an env or another config) and give full paths to errors. The idea would be making all the config values "optional" from serde's perspective and then ordering the full objects, merging, and ensuring that all the needed data is there. Might end up wanting the derive_builder crate.

Main issue I can think of with the custom option would be trying to deserialize the envs using serde but it would not be too hard to implement a custom deserializer and would not require us to write a proc macro. That custom deserializer would then be basically the same a serde_json as far as use.

nambrot commented 1 year ago

How much would you estimate that custom config deserialization to take? On the surface it sounds quite heavy. You obviously looked more into it but I would be inclined to prefer to find opportunities to fix what we have right now, is it possible to easily follow serde_path_to_error and see if it can be fixed?

mattiekat commented 1 year ago

Writing custom config serlization should not take too long because we can mostly just use serde. As a matter of fact I was wrong about needing to write a custom deserializer for the env vars because we can use the one from Config and just modify it a bit (if that is even needed). The work we would have to do is not in writing deserializing, but in writing some better logic to merge each source of config and then give warnings at that point.

I guess doing that would still have a problem though in cases where there is a type error because it would still happen to early on but it would solve the missing field errors. The type errors would at least be able to get pinned to a source though, so like

Error: invalid type: integer `923874`, expected a string

Caused by:
    0: Loading config from `./config/sample_config.json`

and that should be a lot easier to debug.

Hmm, tbh I had not really considered that serde_path_to_error might be the problem. There is a ticket https://github.com/dtolnay/path-to-error/issues/14 that looks like it could be the problem we are facing but would take some time to dig into and verify. I have worked with the author of that package before and he is pretty quick to respond to PRs at least.

asaj commented 1 year ago

+1 to trying to fix via getting serde_path_to_error to work, wherever the problem happens to be