thin-edge / thin-edge.io

The open edge framework for lightweight IoT devices
https://thin-edge.io
Apache License 2.0
221 stars 54 forks source link

Simplify the `tedge config` crate and command #1812

Closed didier-wenzek closed 1 year ago

didier-wenzek commented 1 year ago

The tedge_config crate has aged badly as most of the requirements driving its design have been removed.

Expectations

I would expect:

Proposal

Using figment one can populate a regular TEdgeConfig struct from different sources: starting from default values, merged with the content of the tedge_config file, merged with values extracted from the environment and possibly merged with some computed keys (as the device.id).

Impacts

didier-wenzek commented 1 year ago

@reubenmiller and @albinsuresh : what do you think about the user impacts / benefits of this proposal?

@albinsuresh , @jarhodes-sag , @Bravo555 : what do you think of feasibility of this proposal?

Bravo555 commented 1 year ago
  • Default values provided by an impl Default for TedgeConfig.

Very much agree, for me it seems like the current implementation just increases boilerplate without providing much in return.

  • Some helpers implementation to extract frequently used sub-configs (as an impl From<TedgeConfig> for MqttConfig).

Wouldn't just &tedge_config.mqtt suffice (if I'm correct in assuming that it's of type MqttConfig)? We're just extracting part of the config, and any code using the config should not modify it, unless it's tedge config set command, but it just modifies the config file, so config object is never mutated. So config object can be immutable and we can take any number of shared references to it and its parts.

Some additional thoughts:

reubenmiller commented 1 year ago

@reubenmiller and @albinsuresh : what do you think about the user impacts / benefits of this proposal?

@albinsuresh , @jarhodes-sag , @Bravo555 : what do you think of feasibility of this proposal?

Yes I see definitely see value in this, as extending the configuration should be a very quick/easy/no-fuss thing. We will just have to discuss exactly what the user-impact will be in terms of renaming/normalizing properties etc, as that will guide whether it worth while adding an upgrade step.

jarhodes314 commented 1 year ago

I like the sound of the proposal and expectations, although I have concerns about how we make tedge config work. In an ideal world, the interface would be generated from the TEdgeConfig struct in a similar way to clap. By this, I mean the documentation should be generated from doc comments on the fields, and the config paths for tedge config should be derived from the struct definitions. This, I believe, is fundamental to preventing a repeat of the existing confusion surrounding separators in field names.

doku may be useful for achieving this. It seems to be able to parse the required information out of nested data structures to read doc comments and example values (e.g. https://github.com/anixe/doku/blob/main/doku/examples/serde.rs). I think there would be non trivial effort involved in processing the output of #[derive(Document)], which essentially gives us a specialised, simplified version of the output syn would give, but it does save us from creating our own macro.

I'm not sure what thought is being given to device.id. AFAIK, that's the only config path that is derived from the system rather than set in a file/environment variable. Given the subject of this discussion, I think we will be better off adding support for this as a special case when defining what tedge config get can do, and just providing a function for other crates to read the device ID rather than over-engineering this. An alternative solution could be to have something like TEdgeInferredConfig which can contain the device ID.

jarhodes314 commented 1 year ago

However, I think it might be possible to avoid double underscores and handle ambiguities by using serde's various attributes in combination with figment's Env provider.

The difficulty here arises from the fact that we have to be able to derive the structure from the environment variable name without referring to what the target config type is. Whatever we try and do, the result of deserialising from the environment variables to TEdgeConfig will be the same as deserialising to toml::Value and then deserialising that to TEdgeConfig, because serde depends on the format to dictate the structure of the value. I've run into similar problems decoding hex-encoded IP addresses in CSV which are normally treated as either strings or integers, but they'd occasionally appear as floats (a lossy conversion) because addresses such as 000000E5 parse as f64 in Rust. The only solution I could come up with there was to anticipate the issue and bodge things in the format (e.g. by adding a trailing space to the field and then removing that once it was deserialised).

I do agree that simplifying this is useful, and to that end I suggest we impose a limit on the structure of the config to remain just the current two levels deep. That would allow a relatively simple approach of splitting the key on the first _ (after the TEDGE_ prefix) and then assuming any further _ are part of the field name. If we desperately needed to add a third level to some configuration later down the line, we could support this as we can provide our own logic to interpret some environment variable names specially, but this would involve hard-coding the relevant support in (though doku can probably help here, and I imagine can certainly help us do automated testing to check we avoid name conflicts).

didier-wenzek commented 1 year ago

In an ideal world, the interface would be generated from the TEdgeConfig struct in a similar way to clap.

I can only agree

doku may be useful for achieving this.

Thank you for this proposal. Indeed, it seems to be the tool we need.

I think there would be non trivial effort involved in processing the output of #[derive(Document)], which essentially gives us a specialised, simplified version of the output syn would give, but it does save us from creating our own macro.

We don't have to keep exactly the same look and feel as today for the output. doku::to_toml::<TedgeConfig>() might be enough to replace tedge config doc.

I'm not sure what thought is being given to device.id. AFAIK, that's the only config path that is derived from the system rather than set in a file/environment variable. Given the subject of this discussion, I think we will be better off adding support for this as a special case when defining what tedge config get can do.

I'm okay with that. We can also get rid of this feature that doesn't work well in a containerized context (as the device.id can be produced only in the mosquitto container where the certificate is).

jarhodes314 commented 1 year ago

To give a clear answer on what I think we should support in the future, I think preserving the existing toml format makes sense. It will likely be easier to add clear error messages or deprecation warnings for the existing paths to tedge config than to update tedge.toml (and detect in the future whether this has already been done). This also makes simplifying the environment variable names easier and less error prone. I'll open a separate improvement request for that topic as implementing this refactoring will be a significant undertaking in its own right and that aspect is decidedly tangential to the original goal of this issue.

Bravo555 commented 1 year ago

I might have one more requirement for the new config, which is handling compound configuration options where all the fields have to be set for configuration to be consistent. One example is MQTT client authentication I'm currently adding, which requires two options to be set:

  1. Path to the client certificate
  2. Path to the client private key

Note that there are no reasonable default for either of these options, while for other options eg. hostname-port pairs there is usually a default value for the port which enables us to only pass the hostname.

Now, if we want to make configuration an object that can contain other objects, then I'd argue that it would make sense for the configuration object to contain a field like Option<MqttClientAuthConfig>, which would be guaranteed to contain both certificate and the key, so that the command code won't have to bother about handling cases when only one of them is present.

The issue appears however, when we use tedge config command to edit the configuration. tedge config only works on single keys, so if a user only sets one option, it will leave configuration in an incomplete state. I see three potential solutions:

  1. For some options, require the user to set other options for the configuration to become valid. So e.g. when setting mqtt.client.auth.cert_file there could be a hint to also set mqtt.client.auth.key_file. When the second key is not set, any tedge command will fail because configuration could not be parsed successfully, effectively requiring user to set the other required option. I think it's the easiest and most effective option.
  2. Change tedge config set command to be more TOML-aware, so that it can set compound options. It could look something like this: tedge config set mqtt.client.auth cert_file=/path/to/cert key_file=/path/to/key
  3. Don't support compound options, effectively leaving all the keys optional as they are now. This moves the code handling presence/absence of different options into the code using the configuration, again, as it is now. I'd recommend against it, as by supporting compound options, we could remove some complexity from command code, as well as remove all the empty tables that are present in tedge.toml file, which we need to have because instead of top-level fields being optional, it's all the low-level fields:

    [device]
    
    [c8y]
    root_cert_path = '/etc/ssl/certs'
    
    [az]
    
    [aws]
    
    [mqtt]
    client_host = 'ubuntu-20'
    client_port = 8883
    client_ca_file = '/etc/mosquitto/ca_certificates/'
    
    [http]
    
    [software]
    
    [tmp]
    
    [logs]
    
    [run]
    
    [firmware]
    
    [service]
    pub(crate) struct TEdgeConfigDto {
        /// Captures the device specific configurations
        #[serde(default)]
        pub(crate) device: DeviceConfigDto,
    
        /// Captures the configurations required to connect to Cumulocity
        #[serde(default)]
        pub(crate) c8y: CumulocityConfigDto,
    
        #[serde(default, alias = "azure")] // for version 0.1.0 compatibility
        pub(crate) az: AzureConfigDto,
    
        #[serde(default)]
        pub(crate) aws: AwsConfigDto,
    
        #[serde(default)]
        pub(crate) mqtt: MqttConfigDto,
    
        #[serde(default)]
        pub(crate) http: HttpConfigDto,
    
        #[serde(default)]
        pub(crate) software: SoftwareConfigDto,
    
        #[serde(default)]
        pub(crate) tmp: PathConfigDto,
    
        #[serde(default)]
        pub(crate) logs: PathConfigDto,
    
        #[serde(default)]
        pub(crate) run: PathConfigDto,
    
        #[serde(default)]
        pub(crate) firmware: FirmwareConfigDto,
    
        #[serde(default)]
        pub(crate) service: ServiceTypeConfigDto,
    }
Bravo555 commented 1 year ago
  1. require the user to set other options for the configuration to become valid. So e.g. when setting mqtt.client.auth.cert_file there could be a hint to also set mqtt.client.auth.key_file. When the second key is not set, any tedge command will fail because configuration could not be parsed successfully, effectively requiring user to set the other required option. I think it's the easiest and most effective option.

I realized that it will be trickier that it sounds, because of how tedge config set currently works, which is AFAIK:

  1. Read and parse the entire TOML file into our configuration object
  2. Modify the relevant field in the relevant configuration object
  3. Serialize back to TOML and save in the filesystem

Which means that currently it's impossible for tedge config set to leave configuration in an incomplete state, so if we were to have the following configuration object:

pub(crate) struct MqttClientAuthConfig {
    pub cert_file: FilePath,
    pub key_file: FilePath,
}

We couldn't set only one of the fields at a time.

The root of the problem seems to be, for updating configuration we want to make every field effectively optional, but for reading we want to enforce some tables to define all required keys.

Possible solutions:

  1. When setting fields, don't deserialize, and instead work on loosely typed TOML representation. There would have to be some mechanism preventing us from setting fields which don't exist in our config objects.
  2. Some serde attributes/macro magic to transform our config objects that makes all the fields optional, and then serializing to and deserializing from that. In practice means we use different models for serialization and deserialization. Can become really ugly because 90% of the config doesn't need it, but can be done very elegantly, like e.g. Partial<Type> in typescript.
  3. Just give up and have all the fields be optional.
jarhodes314 commented 1 year ago

I realized that it will be trickier that it sounds, because of how tedge config set currently works, which is AFAIK:

1. Read and parse the entire TOML file into our configuration object

2. Modify the relevant field in the relevant configuration object

3. Serialize back to TOML and save in the filesystem

Which means that currently it's impossible for tedge config set to leave configuration in an incomplete state, so if we were to have the following configuration object:

pub(crate) struct MqttClientAuthConfig {
    pub cert_file: FilePath,
    pub key_file: FilePath,
}

We couldn't set only one of the fields at a time.

In #1916, I haven't explicitly dealt with this problem, but I think I have a reasonable solution. tedge config is essentially using a separate interface to everything else to read individual keys, and each key has an (auto generated) getter method in NewTEdgeConfig. We can, if we want to, not generate a getter for a given DTO field and instead implement one manually without affecting tedge config.

Once we've done this, we can manually create a getter fn mqtt_client_auth(&NewTEdgeConfig) -> Result<MqttClientAuthConfig, ConfigSettingError> which could generate the new struct on the fly, and warn the user appropriately if they haven't set both keys (or it could return Result<Option<MqttClientAuthConfig>, ...>, which would be an error if only one field is set, Ok(None) if neither is set, and Ok(Some(...)) if both are set).

There might be a little bit of attention required to get the user flow for this working nicely. I think keeping tedge config set simple like it is currently is nice, but we probably want to tell the user to go and configure the other setting (if they haven't already).

@Bravo555 does that sound like a reasonable compromise to you?

Bravo555 commented 1 year ago

@jarhodes314 Currently reviewing the PR. It may take some time, so for now I'll be addressing just the comment. If I have any more questions or comments after having reviewed the PR, I will add them in the next comment.

In #1916, I haven't explicitly dealt with this problem, but I think I have a reasonable solution. tedge config is essentially using a separate interface to everything else to read individual keys

Okay, so with tedge config using a different interface to read individual keys, it sounds a bit like the option 1 I proposed in earlier comment: tedge config not fully deserializing, but working on a loosely typed TOML.

and each key has an (auto generated) getter method in NewTEdgeConfig. We can, if we want to, not generate a getter for a given DTO field and instead implement one manually without affecting tedge config.

Once we've done this, we can manually create a getter fn mqtt_client_auth(&NewTEdgeConfig) -> Result<MqttClientAuthConfig, ConfigSettingError> which could generate the new struct on the fly, and warn the user appropriately if they haven't set both keys (or it could return Result<Option<MqttClientAuthConfig>, ...>, which would be an error if only one field is set, Ok(None) if neither is set, and Ok(Some(...)) if both are set).

Okay, so we can override these automatically generated getters in order to run some verification logic that creates some new configuration struct.

But I think it creates some problems:

  1. The getter function can fail: this forces the calling code to handle the error case of invalid configuration state. But the calling code can't meaningfully recover from this error, and has to return the error as well, which just results in a crash. It will be the same in all the places which work with these compound config objects. The way to relieve client code of this complexity is to crash as soon as the config is first read and deserialised. This is good for us, because once we've deserialised, we know that configuration is valid and consistent, and are able to guarantee this to the config clients. But I'd argue this is good for the user as well - reporting invalid configuration as soon as possible makes them fix the configuration as soon as possible, instead of only when using the feature to which an option is related. One can argue that making this a warning would provide the same value for the user, but I'm still for making this a hard error.
  2. Can't move out of NewTEdgeConfig: When we have a function signature like &NewTEdgeConfig -> MqttClientAuthConfig, it implies that we can call it multiple times, creating multiple MqttClientAuthConfigs from a single &NewTEdgeConfig, which in turn, realistically, implies cloning. I agree that perhaps I'm engaging in a bit of premature optimisation here, but I think performance goes hand-in-hand with simplicity; I think ideally we'd want to have a single immutable TEdgeConfig object, fields of which we can easily borrow. If this is impractical, or I'm misunderstanding how we use the config, I'm open to having my mind changed. It doesn't affect the client code as much as 1, so it's not as important.
  3. Related to 2, the idea of having field getters themselves: What I think would be more idiomatic in Rust and result in less code and complexity, would be to have a single getter for configuration, get() -> &NewTEdgeConfig, which would be the only point of access to the configuration object. It would guarantee immutability and we could access parts of it by just borrowing the fields. But here I also may be missing or misunderstanding something, if so, please do correct me.

I'll add that there were some changes in new versions of toml crate, which I think are relevant to us. toml was changed to use toml-edit under the hood, which AFAIK instead of deserializing, updating, and serializing, updates the TOML file directly. It's what is used in cargo add plugin. We could potentially use it to simplify tedge config and enable partial updates.

@didier-wenzek what's your opinion?

jarhodes314 commented 1 year ago
  1. The getters also exist to provide default values. I guess we could do a conversion from TEdgeConfigDto -> something like it but with None replaced by T::default() where relevant, and this might simplify things. We could do the conversion to MqttClientAuthConfig there and provide the relevant warnings/errors then. I'm still not sure exactly how I'd want that to work for fields without defaults. Option seems sensible but will almost invariably lead to inconsistent error messages when the value is required but not set, which is why the getters use a Result<_, ConfigSettingError> instead.

  2. My general opinion re cloning was that TEdgeConfigDto is tiny and borrowing the relevant fields was going to be a pain, particularly to offer support for borrowing String fields as &str rather than &String. This is one of the problems that goes away with your proposed approach of lending an entire struct.

I'm not convinced I want to lean heavily on toml_edit as it makes things a lot more error prone than updating a DTO, although I will admit I last thought about this before I added support for typed updates via the ConfigurationUpdate enum. Verifying a proposed change can be deserialized to ConfigurationUpdate before updating the config to include it should be sufficient, and this allows us to get the correct toml representation.

The real issues with toml_edit come from aliases. If someone has the following config:

[azure]
url = "something.azure.net"

and we trivially updated az.url, we would get:

[azure]
url = "something.azure.net"

[az]
url = "new.example.com"

which, regardless of how serde deals with it, is a bad thing to have. I think the process of using toml_edit will be a series of these trade-offs of "simplicity" for correctness.

One related thing that's nice (IMO) with the way we currently do things is the transparent updating of aliases we get by overwriting the entire configuration. We can rename any field and the next time someone writes to the config via the tedge CLI, any outdated field names are replaced (though this isn't a big issue, we don't tend to rename fields in the toml).

Bravo555 commented 1 year ago

2. My general opinion re cloning was that TEdgeConfigDto is tiny

rust-analyzer tells me it's 744 bytes, at least on my machine. So not that big if we clone it infrequently, but not exactly tiny either.

and borrowing the relevant fields was going to be a pain, particularly to offer support for borrowing String fields as &str rather than &String. This is one of the problems that goes away with your proposed approach of lending an entire struct.

I might be misunderstanding, but what would be the problem here? String derefs to str, so when somebody borrows &String they can just call str methods on &String or use .as_str() to get &str, right?

jarhodes314 commented 1 year ago

2. My general opinion re cloning was that TEdgeConfigDto is tiny

rust-analyzer tells me it's 744 bytes, at least on my machine. So not that big if we clone it infrequently, but not exactly tiny either.

and borrowing the relevant fields was going to be a pain, particularly to offer support for borrowing String fields as &str rather than &String. This is one of the problems that goes away with your proposed approach of lending an entire struct.

I might be misunderstanding, but what would be the problem here? String derefs to str, so when somebody borrows &String they can just call str methods on &String or use .as_str() to get &str, right?

You're right about it not being trivially small. I really meant cloning fields of tedge config, where we usually don't read many. There were also some lifetime issues with macros, and the fact a lot of the config implements copy so people would have to likely start manually dereferencing everywhere.

All of these are problems that are solvable with the existing solution, but it is not a solution I wanted to invest significant effort into until I had some agreement on whether the getters worked well at all, and I'm fairly convinced now that they aren't making anything better than the alternative solution I proposed, where we convert the DTO into a PORS with the relevant defaults filled in.

didier-wenzek commented 1 year ago

Closing as this work will be finalized with https://github.com/thin-edge/thin-edge.io/issues/2019