mehcode / config-rs

⚙️ Layered configuration system for Rust applications (with strong support for 12-factor applications).
Apache License 2.0
2.46k stars 210 forks source link

Exploring a redesign and discussing how a modern Rust configuration library should behave #111

Open mehcode opened 5 years ago

mehcode commented 5 years ago

This library has gotten quite stale over the last year and I've been loathe to come back to it mostly because I don't like it. It's complex to use, complex to develop, etc. I apologize for letting it atrophy.

I'd like to thank you everyone who has contributed or submitted a PR ( especially those who've done so in the last several months and have just sat there ). It's not that I don't want to merge in your work, it's that if I come back to this I'll want to do too much and I haven't had the time available to do that.


With that out of the way I want to explore how a configuration library could behave in Rust, optimally, using some example use cases as a jumping off point:

// load configuration only from envrionment
let settings: Settings = config::env::vars()
    // filter to only keys that case-insentively start with s
    .prefix("app_")
    // explictly transform key names to allow traversing nested configuration
    // replaces .separator("__") from config v0.9
    .map(|key| key.replace("__", "."))
    .load()?;

// find a single configuration file from example.* in
let settings: Settings = config::file::find("example")
    // look in the following paths (by default cwd)
    .paths(&[xdg_config_home, home])
    // expect format in TOML only
    // default allows all formats compiled in
    .format(Format::Toml)
    // and load into Settings
    .load()?;

// find multiple configuration files and merge together
let settings: Settings = config::file::find_all(&[
    "default",
    mode // development, production, or test, etc.
]).load()?;

// load configuration from arguments
let settings: Settings = config::env::args(clap_matches).load();

// manually create a configuration stack
let settings: Settings = config::Stack::new()
    // start with an embedded JSON file 
    .add(config::file::from_bytes(Format::Json, include_bytes!("default.json")))
    // override from environment
    .add(config::env::vars())
    // load into settings
    .load()?;

// create a dynamic configuration (ala config v0.9)
// any _source_ could be built into a Config struct
let settings: config::Config = config::file::find("example").build();

// a dyn configuration can listen for changes on supported sources
let _handle = settings.watch(|| do_something_fun_when_config_changes);

// use path access
let host: &str = settings.get("db.host");

Thoughts? How would you want config to work?

Are there any pain points in the current design you'd like to discuss?

mockersf commented 5 years ago

Just a comment describing how I tend to setup configuration:

My pain points:

mwillsey commented 5 years ago

I love how config lets you write hierarchical configuration settings, but that does raise an interesting question as to how overrides/merging works. Example:

struct Settings {
    foo: String,
    bar: Option<BarSettings>,
    bar: Option<BarSettings>,
}

struct BarSettings {...}
struct BazSettings {...}

And then my config looks like this:

foo = "hello world"

[bar]
item1 = 1234
item2 = "asdf"

If I create Settings from the config file, I'll end up with Settings.baz being None, which makes sense. My question is how would I override that (or merge it in) from the environment variables?

If you try to merge in something like APP_BAR= or something like that, you end up merging in a (potentially empty) string in for the value of a table! Then, serde will fail to deserialize into Settings. This is surprising to me because it's merging in a string for the value of (what was) a table.

I see a couple potential ways to address this:

  1. Special case merging the empty string with a table, making the result and empty table
  2. Somehow generalize 1 into some kind of "typed" merging
  3. Manually walk the dynamic config, essentially replacing empty strings with empty tables by hand (where appropriate).

I suppose a similar situation could occur when getting config values from the command line. I think the larger questions is what does it mean to get config values from a non-hierarchical source, and then later merge them with a hierarchical source?

mehcode commented 5 years ago

@mwillsey

If I create Settings from the config file, I'll end up with Settings.bar being None, which makes sense. My question is how would I override that (or merge it in) from the environment variables?

The idea is that you would override using the key app.baz.one which would match a field one in BazSettings. As that's not possible to write in an env var there is a .separator("_") function you can call on config::Environment which does a string replace of _ for ..


@mockersf

1) Do you merge all of those configuration files together or is it last one wins?

My pain points: [...]

It seems we need to think hard on how CLI arguments tie into this. I have a fairly good idea on how to tie clap in normally but I hadn't thought of how to tie config into clap as a "default values provider".

mwillsey commented 5 years ago

@mehcode Sorry, edited parent comment. If I create a config from the above config file, I should get Settings.baz (not bar, as previously stated). The question is, how do I "cancel out" Settings.bar, which was defined as a table in the config file.

I understand the separator trick, but I AFAIK it's not possible to use that to un-define a table (in this case, Settings.bar.

mehcode commented 5 years ago

@mwillsey Hmm.. Un-define in general is a hard concept. APP_BAR= is understood as "undefined" not "null" (by default) as its a very common unix idiom to unset an environment variable like that instead of actually unsetting it. So it would do nothing to the previous properties.

We could probably allow a JSON string to be merged in for a table or array. Viper (go config lib) does that.. but that only can get you an empty map. Not a null map. I suppose that could still work if you go one property up.. APP='{"bar": null}'.

Un-setting from a string is probably a lot easier as you could just merge in a string and tell config to treat it as JSON or TOML or something.

geniusisme commented 5 years ago

Considering merging, I would like to see "action" keys, which will make use of special characters. These keys should allow altering config when merged. For example, "remove$app.bar" to remove value completely. Or "add$app.bar"=[1, 2, 3] to add values into array.

vorner commented 4 years ago

I use config as kind of a backend for spirit. I read the path(s) to config files, any potential config overrides and such from command line and then feed config with it. So, I only use the ability to merge configs (of possibly different types) together and then parse it to something implementing Deserialize.

Therefore I guess the interface doesn't matter to me that much, though I'd really like having more informative error messages and being a bit more robust and well defined about how the merging and such works.

KoltesDigital commented 4 years ago

I also use this crate for its ability to merge from multiple sources, including defaults, and to deserialize to a struct. I guess if I were to do, I'd do the same, with some kind of format provider which would feed with Option.

My only problem is the case (#65). I heavily rely on sub-objects for namespacing, because IMHO some (theoretically all) modules should not know about others, and as such I'm not fan of 12 factor's reliance on a flat list of key-value pairs. Besides with docker configs and secrets, it's easy to provide different config files for different envs.

Anyway, I'd say either a project relies on env vars only, or on config files only, but not both (I gladly receives counter-examples). Serde allows for renaming struct fields, so for env vars #[serde(rename_all = "SCREAMING_SNAKE_CASE")] should be used, and for files #[serde(rename_all = "camelCase")] or whatever. My point is, this crate shouldn't take the responsibility of renaming fields too.

vorner commented 4 years ago

I could imagine a project where one would have most of the configuration inside a config file, but set passwords through environment variables. I think we have some such uses in the company, though not in Rust.

As for the case, I actually find the fact the config is case insensitive as an advantage. Not a huge one, but it feels more convenient from the user perspective.

KoltesDigital commented 4 years ago

Thanks for the use case. I think you imply that both the config from the file and the passwords end up in the same list. As I prefer to divide config by modules, I don't see how both needs could fit together.

If I had to accept env vars, I guess I'd have a dedicated struct for available env keys, and then I'd put these values into the config structs. Ultimately this dedicated struct would be similar to what structopt does, i.e. env var and cli arguments serve the same goal.

vorner commented 4 years ago

I think you imply that both the config from the file and the passwords end up in the same list. As I prefer to divide config by modules, I don't see how both needs could fit together.

Yes, I like to have one „big“ config structure and then I hand out parts of it to submodules of the application. I tend to consider configuration reload to be transactional, so I don't want to load part of new configuration and then fail on another sub-config.

Nevertheless, even if I didn't have that, let's say I have something like „update source“ ‒ a place where updates for whatever data the application holds.

#[derive(Deserialize)]
struct Updates {
    base_url: Url,
    username: String,
    password: String,
}

This is definitely supposed to come to the same place in the application, so it should be the same config. But the url will be in config file, probably distributed together in a docker image or deb package or whatever. The password will be overwritten by an environment variable.

rahulwa commented 4 years ago

Sorry for jump in between, i never tried config-rs, just wanted to share quite famous way to handle config. https://github.com/spf13/viper Let’s you use config without worry of from where value is coming from, it can be from file/env/key-value-store etc. i love that application doesn’t need to care about how value being set.

chrisbouchard commented 4 years ago

One use case for me would be able to use config::file::find as per the second example in the OP, but with a different filename per-path. E.g.,

I'd personally be happy passing a slice of Paths or PathBufs and having it pick the first one that exists. (I'm using directories anyway.) I'd be even happier if I could pass a slice of Paths or PathBufs without extensions, and have it pick up the basename and try different formats.

chrisbouchard commented 4 years ago

Regarding the first example, I'm not a fan of manually map-ing replace like that. It feels so, well, manual compared to how smart the rest of this API is. I personally like having a specified separator — and I think the prefix should be separated from the key by separator, rather than always by "_", so that we don't have to set the prefix as "app_" like the example does.

In fact, I'd like to see an option for the separator to get even smarter: I'd like to have an environment variable like PREFIX_FOO_BAR_BAZ_BLAH map to foo_bar.baz_blah because my structs are:

pub struct Settings {
    foo_bar: SubSettings
}

pub struct SubSettings {
    baz_blah: i32
}

I know this is a long-shot because we have to map to a key string, which has no knowledge of its eventual location in a struct.

dawid-nowak commented 3 years ago

The way I see it, I am looking for two things from the config file :

  1. I would like to be able to express my configuration in richer and hierarchical constructs and then translate those into appropriate objects. Ideally, I should be able to write my config file in yaml, json, toml and properties and that should translate to the same structure type. I would like it if it was possible to express all that complexity through env variables but I am not sure if that is easily doable or even doable for the complex configuration files.

  2. I would like an ability to override certain configuration values in my config file from a different source. An example of a real life use case would be an ability to inject a different database endpoint/password into a config file from the env variable at deploy time to stage or prod environments.

I think there are two ways of addressing point 2:

  1. Patching/Merging - I think this is what the existing code is doing and I do believe that will work fine in a situation where two sources are config files. The main problem as I see it is how to patch a complex config file from the env variable. If I wanted to add/replace client "client_topic:orders" from the env variable, I probably would need to create a very long env variable and even then I am not sure if that would work.
pub_sub:
  kafka_config:
    - brokers: [kafka-1.kafka-headless.kafka.svc.cluster.local:9092]
      consumer_topics:
        - consumer_topic: processor1_kafka_blue
          consumer_group: swir
          client_topic: orders
      producer_topics: []      
  1. Replacing - In this case, we pepper config files with placeholders so the env variables can be injected in the appropriate places. This is not pretty either. The config file can look pretty ugly. Possibly we could break the formatting of the file so using linting tools would be restricted. But this approach allows for a simpler injection of env variables.
    pub_sub:
    kafka_config:
    - brokers: [kafka-1.kafka-headless.kafka.svc.cluster.local:9092]
      consumer_topics:
        - consumer_topic: processor1_kafka_blue
          consumer_group: swir
          client_topic: ${orders_topic}
      producer_topics: []      

I'd like to know what other think:

szarykott commented 3 years ago

@dawid-nowak

Let me refer to your concern with point number 1. You do not need to patch values from environment variables directly. What you can do it to keep various versions of config files in VCS named for example appsettings.yaml, appsettings.dev.yaml or appsettings.prod.yaml. Those files could look like this for example :

appsettings.yaml

pub_sub:
  kafka_config:
    - brokers: [kafka-1.kafka-headless.kafka.svc.cluster.local:9092]
      consumer_topics:
        - consumer_topic: processor1_kafka_blue
          consumer_group: swir
          client_topic: orders
      producer_topics: []  

appsettings.dev.yaml

pub_sub:
  kafka_config:
    - brokers: [kafka-1.kafka-headless.kafka.svc.cluster.local:9092]
      consumer_topics:
        - consumer_topic: processor1_kafka_blue
          consumer_group: swir
          client_topic: orders_dev
      producer_topics: []  

What you'd keep in env variables is just the name of environment, here : dev and prod and you'd load appropriate file based on its name in runtime. appsettings.yaml would serve as default settings used regardless of environment.

Ideally you'd want to write in settings files "down the waterfall" only those lines you'd want to overwrite (here : only client_topic with path leading to it, dropping e.g consumer_group as it is the same in parent file). On the other hand though, if you kept all other keys that do not change, reading config files would be much easier as all keys would be visible. The cost would be however more typing when creating them.

What I propose is how we solve such issues at my company (and in .NET Core world from which I come from) so you could take a look at IConfiguration inteface (and its implementation) in .NET Core which is heavily battle-tested and proven to work in many circumstances.

tmccombs commented 3 years ago

The cost would be however more typing when creating them.

And needing to update all files whenever you need to change a config. Which can be easy to forget.

Korvox commented 3 years ago

Just to throw my bone in the ring I've been fighting against the state of config in Rust this last week and came up with this as my personal ideal workflow:

  1. Structure your config as a (possibly heirarchical composite) of deserializable & (optionally) serializable structs. If you want defaults you'd want to also derive / impl default.
  2. derive(config) the whole thing to generate a bunch of impls to use this struct as a config store.
  3. The config derive would generate a builder pattern to construct a config from various sources, ex: let conf = Settings::named("myapp").watched_file().env().args().build()?;

The methods defined by the derive would include the following:

And more, obviously.

What I really like about this is the derive and builder pattern lets you incrementally add sources and eventually have this opaque configuration struct that can do a lot behind the scenes - I can see constructors that can take event loop handles to use async or threadpools for the IO and these builder methods can be generally complex because I can imagine a lot can be done with constexpr at compile time to figure out what the final struct is supposed to actually look like. Ultimately you would only be doing real work in the build() function using a bunch of constexpr prefill.

But for a user all you are doing is making a plain old data struct, using a derive macro, and building your config with the options you want. No other boilerplate. To write to it you just set properties on the struct and call .save() which it and the drop impl can writeback changes with if the struct derives serialize.

vorner commented 3 years ago

Hello

Your builder approach looks kind of nice on the high level. But there are some things that don't add up for me:

If you already want your struct to be Deserialize (possibly derived or manually implemented), what is the reason why you want the derive(Config)? Usually, derive macros generate some code based on the structure/shape, with possibly descending into the fields. I assume this is not needed here, as the structure is used from Deserialize. If it needed to read the structure/shape, then it would probably require that everything down (the field types, their field's types, etc) are also derive(Config), which would be kind of a downside (a lot of things provide Deserialize implementation, but wouldn't generally provide Config impl).

If you only care about these new methods, wouldn't an extension trait (something like itertools or similar does to add more methods to all the iterators) on top of Deserialize be enough? Or, if don't want everything to be Config automatically, having a trait with all the methods being provided would also work:

// In the lib:
pub struct Config: DeserializeOwned {
  fn new() -> ConfigBuilder<Self> { ... }
}

// The user:
#[derive(Deserialize)] 
struct MyConfig {
  params: Vec<String>,
}

impl Config for MyConfig {} // Yes, empty here

I can see constructors that can take event loop handles to use async or threadpools for the IO and these builder methods can be generally complex because I can imagine a lot can be done with constexpr at compile time to figure out what the final struct is supposed to actually look like.

I'm not sure if integration with threadpools should be in scope of this library. I mean, if there's a library that can read bunch of formats, combine different sources together and spits out parsed data structure, that's great. Should watching files and realoading be done in the same library, or in some that builds on top? I don't really want to pull in the whole tokio to read a 20 lines long config file 😈.

To write to it you just set properties on the struct and call .save() which it and the drop impl can writeback changes with if the struct derives serialize.

That sounds a bit scary. First, I believe most applications don't modify its own configuration. At least servers don't. And it seems to be not obvious where it would store the changes if, for example, the whole config was read from the environment variables. And if save takes a path to the file, does it provide anything on top eg. serde-json, if you structure already implements Serialize?

I don't see what exactly you want to do in drop, but saving configuration is a fallible operation and there's no way to signal these from within destructors, so such pattern is generally discouraged (you can either panic, which is probably the wrong thing to do, or ignore the error, which is definitely not a robust design).

So, in general, the composition by the builder pattern seems nice (nicer than the fn merge(&mut self, …) thing), but would need some details ironed out.

I also wonder what scope the library should have. For me it would make sense to split the parsing/merging/decoding of configuration from other features, like file change watching (both because one might not to want it, and because there's a million ways how you may want to trigger your config reload, or where it happens and how the changes are propagated back into the program; as the author of spirit, I know how much complexity there is :-| ).

tmccombs commented 3 years ago

It would need to derive to support adding the fields to support watching, etc.

Why? The way I imagine watching working, is when a fule changes, it just reloads the full config.

Save on close is just something a lot of other config frameworks do because in practice users will generally forget to save (or in something like GUI configuration not have to save on every user change).

Save on close doesn't necessarily require save on drop. And having the save method separate givea the developer more control, and the ability to handle errors.

tmccombs commented 3 years ago

I see, so the purpose of the derive is to store additional state in the config state? Couldn't the same thing be accomplished with a wrapper Config<T> struct that implements Deref<Target=T> and possibly DerefMut?

You don't even need this in the crate either since an end user is defining the struct and can thus impl drop themselves if they want it.

Exactly. If that is desired behavior, the user of the library can just implement Drop on their config struct themselves (or if a wrapper type is used, as I mentioned above, implement it on Config<MyConfig>).

vorner commented 3 years ago

First, AFAIK, derive macros don't allow you to inject more fields. They allow you to add code, but outside of the struct (eg. make impl blocks).

Second, why would I ever want, as the user of the library, to intermix the config structure (the data loaded from the config file) with the manager? Shouldn't the manager be some kind of object that does the tracking and provides me/contains/derefs to the actual config I've decided to use? Putting both into the same thing conflates two things into one. I can certainly want to pass a config structure to a function, but not pass the manager into the same function.

rakshith-ravi commented 3 years ago

A little off-topic, but how about a macro that embeds the configuration provided at compile-time into your code? Essentially, during compile time (perhaps in CI), the configuration can be provided through the same merge-semantics (and combining multiple configurations) and provided as a constant struct, making value accesses hard-coded? Idk just a random idea. Maybe that combined with env vars during build time would be useful in a few use-cases (especially in embedded scenarios), considering runtime performance would be improved significantly, since the compiler can now optimize for constant values.

AlexMikhalev commented 2 years ago
// load configuration from arguments
let settings: Settings = config::env::args(clap_matches).load();

+1 for this one. I found re-creating large parts of config.rs manually to achieve clear precedence rules flags > env > toml.

djeedai commented 2 years ago

Adding to this issue, but I might also open a separate one. I've just found out about this crate, and although it looks like it covers part of what I need, I'm actually put off by the sheer number of mandatory dependencies. The support for the various serialization formats are behind Cargo features as one would expect, but it seems (or did I misunderstood something?) that the various sources are all mandatory. In particular, web/online ones like HTTP via reqwest are almost a deal breaker; many applications either do not want any network connection, or do want absolute control over them due to privacy or security concerns (especially when dealing with user settings), or have special requirements in terms of how networking can be done. Same for disk access in some contexts. The result is that this crate depends on a long list of crates to provide various sources that cannot be disabled, for something that is not its main role (I'd assume it's a config management and serialization/deserialization library, not a networking one or a filesystem abstraction one). I've yet to decide if I'm going to use this, some part of it, or just rebuild the config management part (the core responsibility of this crate) just to be able to rip off all the clutter around the various sources I don't want/need.

matthiasbeyer commented 2 years ago

I moved this issue to the project so we can consider it as well (see here for more details).

If you, as a commenter in this issue, want to contribute to the effort of re-thinking config-rs, please open issues for your ideas (see the "Help wanted" section here).

SamuelMarks commented 3 months ago

Two years later! - How're the plans progressing?

matthiasbeyer commented 3 months ago

https://github.com/mehcode/config-rs/issues/549

apps4uco commented 2 weeks ago

Hi, just wanted to add that allowing Humanized numbers would be great, maybe leveraging the parse_size crates parse_size function to convert 1KB into 1024 etc