Closed mattiekat closed 1 year ago
Looking at this ticket again, I would definitely constrain the scope of this ticket towards the "completion of config loading", not for the parsing of things from strings separately (if I understand correctly).
For example, my PR here https://github.com/hyperlane-xyz/hyperlane-monorepo/pull/1591 at least spits things out which is probably is still impossible to debug for the layman, but does help me quite a lot in debugging. Any easy wins that can point out the path in the tree and maybe show the path itself would be great.
Or maybe a different binary that spits out the config, back serialized as a json object in a file (i.e. the config files + env vars merged) so that can be easily inspected
Or maybe there is a way to add strings to the structs in a "macro" (i'm sure i'm using the wrong rust term here), that could be printed out when config deserialization fails so that we can easily get more "human" error messages that point out what is missing or wrong?
Wanted to start off by playing around a bit to get a feel for what errors are hard to understand and what errors are good...
Error: Loading chain alfajores
Caused by:
0: Loading chain alfajores
1: Chain name implies a different domain than the domain id provided; the config is probably wrong.
Location:
hyperlane-base/src/settings/chains.rs:278:26
...full config printout omitted
Error: invalid type: integer `44787`, expected a string
Location:
/rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/convert/mod.rs:726:9
Error: Loading chain alfajores
Caused by:
0: Loading chain alfajores
1: Invalid ethereum address
2: Invalid input length
...full config printout omitted
Error: invalid type: integer `923874`, expected a string
Location:
/rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/convert/mod.rs:726:9
...full config printout omitted
Error: missing field `name`
Location:
/rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/convert/mod.rs:726:9
...full config printout omitted
Error: missing field `protocol`
Location:
/rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/convert/mod.rs:726:9
...full config printout omitted
Error: missing field `name`
Location:
/rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/convert/mod.rs:726:9
From early looking, there is a clear divide of two fundamental errors (as was alluded to previously).
Right now the errors after the config has been parsed are better and include more hints but lack a path to the origin config so it is hard to know for sure where the value came from especially if it is overridden somewhere.
The errors during config parsing are the worst and just have the really generic "Error missing field" or "Error invalid type" displayed.
We should probably work on getting some better context info for some of the parsing, such as for the invalid mailbox address, saying it was the mailbox specifically that was wrong.
Primary focus should be on seeing if there is anyway we can reasonably fix the Config
specific errors that are generated by deserialization itself.
Ideally in all cases we would be able to display the source of the given config value. I.e. what file or env var set it and the path to it within a config file.
Primary focus should be on seeing if there is anyway we can reasonably fix the Config specific errors that are generated by deserialization itself.
What exactly do you mean by this?
What exactly do you mean by this?
Errors like
...full config printout omitted
Error: missing field `name`
Location:
/rustc/90743e7298aca107ddaa0c202a4d3604e29bfeb6/library/core/src/convert/mod.rs:726:9
Would ideally print something like
Error: missing field `name` at `chains.alfajores.name`
Spent some time digging into the ConfigError type hoping that maybe we could destructure it and get the info out that we need but it actually is not showing up as type error or not found errors even when I trigger what I would expect to cause them which means we don't have access to that extra info.
Found https://github.com/mehcode/config-rs/issues/83 which suggested serde_path_to_error
which was not working for us. Perhaps support for it broke? We kept getting an empty path object. I am guessing this is a result of something about how the config
library is merging files and the people it worked for in the ticket may have been using a more simple case. Went ahead and posted a comment with the hope someone might know what is going on.
My next thought was to go looking through their codebase to see what the possibilities are for making a forked version that includes more path data as it parses and I think the answer is it would be very hard because they are relying very heavily on serde in a way that I don't think would be easy to add that to.
I want to sleep on this thought, but custom config deserialization might not be too hard using serde + the rust built-in support for envs (https://doc.rust-lang.org/std/env/fn.vars.html). See the thing is we don't need a fully custom implementation and we only care about support JSON and ENV overrides, so I think it would not be too hard to do this in a way that looks to the outside observer the same as how we do it now. We would be able to give much better errors on where the configured value was defined (so you know if it was overiden in an env or another config) and give full paths to errors. The idea would be making all the config values "optional" from serde's perspective and then ordering the full objects, merging, and ensuring that all the needed data is there. Might end up wanting the derive_builder
crate.
Main issue I can think of with the custom option would be trying to deserialize the envs using serde but it would not be too hard to implement a custom deserializer and would not require us to write a proc macro. That custom deserializer would then be basically the same a serde_json
as far as use.
How much would you estimate that custom config deserialization to take? On the surface it sounds quite heavy. You obviously looked more into it but I would be inclined to prefer to find opportunities to fix what we have right now, is it possible to easily follow serde_path_to_error
and see if it can be fixed?
Writing custom config serlization should not take too long because we can mostly just use serde. As a matter of fact I was wrong about needing to write a custom deserializer for the env vars because we can use the one from Config
and just modify it a bit (if that is even needed). The work we would have to do is not in writing deserializing, but in writing some better logic to merge each source of config and then give warnings at that point.
I guess doing that would still have a problem though in cases where there is a type error because it would still happen to early on but it would solve the missing field errors. The type errors would at least be able to get pinned to a source though, so like
Error: invalid type: integer `923874`, expected a string
Caused by:
0: Loading config from `./config/sample_config.json`
and that should be a lot easier to debug.
Hmm, tbh I had not really considered that serde_path_to_error
might be the problem. There is a ticket https://github.com/dtolnay/path-to-error/issues/14 that looks like it could be the problem we are facing but would take some time to dig into and verify. I have worked with the author of that package before and he is pretty quick to respond to PRs at least.
+1 to trying to fix via getting serde_path_to_error
to work, wherever the problem happens to be
Right now we get very useless errors when a config value is considered "invalid". The goal of this ticket is to find out how much of a lift it would be to change the config library to track more useful debug information such as the path to an error in the config itself and (if it received an invalid value) where that value came from.
This is becoming a larger issue now that we have other people running the agents as it makes it very hard for them to figure out what they are doing wrong and makes the experience of hosting Hyperlane frustrating.
One thing we are doing right now that is not helping is we parse integers, addresses, and other types after config has already been loaded as strings, so we don't have the full information about where it came from. We might need to store some metadata to be able to report what went wrong or find a way to extend config to support these other serializable types (which I presume it does not now as we have avoided using it to do so).