Open mehcode opened 5 years ago
Just a comment describing how I tend to setup configuration:
/
, try to read a config file and overridestructopt
My pain points:
--help
, I would like it to show the value found from a configuration file if there is one rather than a default that would be override by said configuration fileI love how config
lets you write hierarchical configuration settings, but that does raise an interesting question as to how overrides/merging works. Example:
struct Settings {
foo: String,
bar: Option<BarSettings>,
bar: Option<BarSettings>,
}
struct BarSettings {...}
struct BazSettings {...}
And then my config looks like this:
foo = "hello world"
[bar]
item1 = 1234
item2 = "asdf"
If I create Settings
from the config file, I'll end up with Settings.baz
being None
, which makes sense. My question is how would I override that (or merge it in) from the environment variables?
If you try to merge in something like APP_BAR=
or something like that, you end up merging in a (potentially empty) string in for the value of a table! Then, serde
will fail to deserialize into Settings. This is surprising to me because it's merging in a string for the value of (what was) a table.
I see a couple potential ways to address this:
I suppose a similar situation could occur when getting config values from the command line. I think the larger questions is what does it mean to get config values from a non-hierarchical source, and then later merge them with a hierarchical source?
@mwillsey
If I create Settings from the config file, I'll end up with Settings.bar being None, which makes sense. My question is how would I override that (or merge it in) from the environment variables?
The idea is that you would override using the key app.baz.one
which would match a field one
in BazSettings
. As that's not possible to write in an env var there is a .separator("_")
function you can call on config::Environment
which does a string replace of _
for .
.
@mockersf
1) Do you merge all of those configuration files together or is it last one wins?
My pain points: [...]
It seems we need to think hard on how CLI arguments tie into this. I have a fairly good idea on how to tie clap in normally but I hadn't thought of how to tie config into clap as a "default values provider".
@mehcode Sorry, edited parent comment. If I create a config from the above config file, I should get Settings.baz
(not bar
, as previously stated). The question is, how do I "cancel out" Settings.bar
, which was defined as a table in the config file.
I understand the separator trick, but I AFAIK it's not possible to use that to un-define a table (in this case, Settings.bar
.
@mwillsey Hmm.. Un-define in general is a hard concept. APP_BAR=
is understood as "undefined" not "null" (by default) as its a very common unix idiom to unset an environment variable like that instead of actually unsetting it. So it would do nothing to the previous properties.
We could probably allow a JSON string to be merged in for a table or array. Viper (go config lib) does that.. but that only can get you an empty map. Not a null
map. I suppose that could still work if you go one property up.. APP='{"bar": null}'
.
Un-setting from a string is probably a lot easier as you could just merge in a string and tell config to treat it as JSON or TOML or something.
Considering merging, I would like to see "action" keys, which will make use of special characters. These keys should allow altering config when merged. For example, "remove$app.bar" to remove value completely. Or "add$app.bar"=[1, 2, 3] to add values into array.
I use config
as kind of a backend for spirit
. I read the path(s) to config files, any potential config overrides and such from command line and then feed config
with it. So, I only use the ability to merge configs (of possibly different types) together and then parse it to something implementing Deserialize
.
Therefore I guess the interface doesn't matter to me that much, though I'd really like having more informative error messages and being a bit more robust and well defined about how the merging and such works.
I also use this crate for its ability to merge from multiple sources, including defaults, and to deserialize to a struct. I guess if I were to do, I'd do the same, with some kind of format provider which would feed with Option
My only problem is the case (#65). I heavily rely on sub-objects for namespacing, because IMHO some (theoretically all) modules should not know about others, and as such I'm not fan of 12 factor's reliance on a flat list of key-value pairs. Besides with docker configs and secrets, it's easy to provide different config files for different envs.
Anyway, I'd say either a project relies on env vars only, or on config files only, but not both (I gladly receives counter-examples). Serde allows for renaming struct fields, so for env vars #[serde(rename_all = "SCREAMING_SNAKE_CASE")]
should be used, and for files #[serde(rename_all = "camelCase")]
or whatever. My point is, this crate shouldn't take the responsibility of renaming fields too.
I could imagine a project where one would have most of the configuration inside a config file, but set passwords through environment variables. I think we have some such uses in the company, though not in Rust.
As for the case, I actually find the fact the config is case insensitive as an advantage. Not a huge one, but it feels more convenient from the user perspective.
Thanks for the use case. I think you imply that both the config from the file and the passwords end up in the same list. As I prefer to divide config by modules, I don't see how both needs could fit together.
If I had to accept env vars, I guess I'd have a dedicated struct for available env keys, and then I'd put these values into the config structs. Ultimately this dedicated struct would be similar to what structopt does, i.e. env var and cli arguments serve the same goal.
I think you imply that both the config from the file and the passwords end up in the same list. As I prefer to divide config by modules, I don't see how both needs could fit together.
Yes, I like to have one „big“ config structure and then I hand out parts of it to submodules of the application. I tend to consider configuration reload to be transactional, so I don't want to load part of new configuration and then fail on another sub-config.
Nevertheless, even if I didn't have that, let's say I have something like „update source“ ‒ a place where updates for whatever data the application holds.
#[derive(Deserialize)]
struct Updates {
base_url: Url,
username: String,
password: String,
}
This is definitely supposed to come to the same place in the application, so it should be the same config. But the url will be in config file, probably distributed together in a docker image or deb package or whatever. The password will be overwritten by an environment variable.
Sorry for jump in between, i never tried config-rs, just wanted to share quite famous way to handle config. https://github.com/spf13/viper Let’s you use config without worry of from where value is coming from, it can be from file/env/key-value-store etc. i love that application doesn’t need to care about how value being set.
One use case for me would be able to use config::file::find
as per the second example in the OP, but with a different filename per-path. E.g.,
config.toml
in $XDG_CONFIG_DIR
.app-name.toml
in $HOME
I'd personally be happy passing a slice of Path
s or PathBuf
s and having it pick the first one that exists. (I'm using directories
anyway.) I'd be even happier if I could pass a slice of Path
s or PathBuf
s without extensions, and have it pick up the basename and try different formats.
Regarding the first example, I'm not a fan of manually map
-ing replace
like that. It feels so, well, manual compared to how smart the rest of this API is. I personally like having a specified separator
— and I think the prefix should be separated from the key by separator
, rather than always by "_"
, so that we don't have to set the prefix as "app_"
like the example does.
In fact, I'd like to see an option for the separator to get even smarter: I'd like to have an environment variable like PREFIX_FOO_BAR_BAZ_BLAH
map to foo_bar.baz_blah
because my structs are:
pub struct Settings {
foo_bar: SubSettings
}
pub struct SubSettings {
baz_blah: i32
}
I know this is a long-shot because we have to map to a key string, which has no knowledge of its eventual location in a struct.
The way I see it, I am looking for two things from the config file :
I would like to be able to express my configuration in richer and hierarchical constructs and then translate those into appropriate objects. Ideally, I should be able to write my config file in yaml, json, toml and properties and that should translate to the same structure type. I would like it if it was possible to express all that complexity through env variables but I am not sure if that is easily doable or even doable for the complex configuration files.
I would like an ability to override certain configuration values in my config file from a different source. An example of a real life use case would be an ability to inject a different database endpoint/password into a config file from the env variable at deploy time to stage or prod environments.
I think there are two ways of addressing point 2:
pub_sub:
kafka_config:
- brokers: [kafka-1.kafka-headless.kafka.svc.cluster.local:9092]
consumer_topics:
- consumer_topic: processor1_kafka_blue
consumer_group: swir
client_topic: orders
producer_topics: []
pub_sub:
kafka_config:
- brokers: [kafka-1.kafka-headless.kafka.svc.cluster.local:9092]
consumer_topics:
- consumer_topic: processor1_kafka_blue
consumer_group: swir
client_topic: ${orders_topic}
producer_topics: []
I'd like to know what other think:
@dawid-nowak
Let me refer to your concern with point number 1. You do not need to patch values from environment variables directly. What you can do it to keep various versions of config files in VCS named for example appsettings.yaml
, appsettings.dev.yaml
or appsettings.prod.yaml
. Those files could look like this for example :
appsettings.yaml
pub_sub:
kafka_config:
- brokers: [kafka-1.kafka-headless.kafka.svc.cluster.local:9092]
consumer_topics:
- consumer_topic: processor1_kafka_blue
consumer_group: swir
client_topic: orders
producer_topics: []
appsettings.dev.yaml
pub_sub:
kafka_config:
- brokers: [kafka-1.kafka-headless.kafka.svc.cluster.local:9092]
consumer_topics:
- consumer_topic: processor1_kafka_blue
consumer_group: swir
client_topic: orders_dev
producer_topics: []
What you'd keep in env variables is just the name of environment, here : dev
and prod
and you'd load appropriate file based on its name in runtime. appsettings.yaml
would serve as default settings used regardless of environment.
Ideally you'd want to write in settings files "down the waterfall" only those lines you'd want to overwrite (here : only client_topic
with path leading to it, dropping e.g consumer_group
as it is the same in parent file). On the other hand though, if you kept all other keys that do not change, reading config files would be much easier as all keys would be visible. The cost would be however more typing when creating them.
What I propose is how we solve such issues at my company (and in .NET Core world from which I come from) so you could take a look at IConfiguration inteface (and its implementation) in .NET Core which is heavily battle-tested and proven to work in many circumstances.
The cost would be however more typing when creating them.
And needing to update all files whenever you need to change a config. Which can be easy to forget.
Just to throw my bone in the ring I've been fighting against the state of config in Rust this last week and came up with this as my personal ideal workflow:
derive(config)
the whole thing to generate a bunch of impls to use this struct as a config store.let conf = Settings::named("myapp").watched_file().env().args().build()?;
The methods defined by the derive would include the following:
::new()
for unnamespaced configs::named()
for config where every source is namespaced by the appname given (so if your app is foobar, it would source env from FOOBAR_.file()
to read dirs:;config_dir() + named() .watched_file()
to watch the same default config file instead.named_file("name")
to watch a specific file with the suffix inferred.env()
to read environment variables from the aformentioned named namespace.named_env("name")
for custom prefix env search.args
to parse the cli args And more, obviously.
What I really like about this is the derive and builder pattern lets you incrementally add sources and eventually have this opaque configuration struct that can do a lot behind the scenes - I can see constructors that can take event loop handles to use async or threadpools for the IO and these builder methods can be generally complex because I can imagine a lot can be done with constexpr at compile time to figure out what the final struct is supposed to actually look like. Ultimately you would only be doing real work in the build() function using a bunch of constexpr prefill.
But for a user all you are doing is making a plain old data struct, using a derive macro, and building your config with the options you want. No other boilerplate. To write to it you just set properties on the struct and call .save()
which it and the drop impl can writeback changes with if the struct derives serialize.
Hello
Your builder approach looks kind of nice on the high level. But there are some things that don't add up for me:
If you already want your struct to be Deserialize
(possibly derived or manually implemented), what is the reason why you want the derive(Config)
? Usually, derive macros generate some code based on the structure/shape, with possibly descending into the fields. I assume this is not needed here, as the structure is used from Deserialize
. If it needed to read the structure/shape, then it would probably require that everything down (the field types, their field's types, etc) are also derive(Config)
, which would be kind of a downside (a lot of things provide Deserialize
implementation, but wouldn't generally provide Config
impl).
If you only care about these new methods, wouldn't an extension trait (something like itertools or similar does to add more methods to all the iterators) on top of Deserialize
be enough? Or, if don't want everything to be Config
automatically, having a trait with all the methods being provided would also work:
// In the lib:
pub struct Config: DeserializeOwned {
fn new() -> ConfigBuilder<Self> { ... }
}
// The user:
#[derive(Deserialize)]
struct MyConfig {
params: Vec<String>,
}
impl Config for MyConfig {} // Yes, empty here
I can see constructors that can take event loop handles to use async or threadpools for the IO and these builder methods can be generally complex because I can imagine a lot can be done with constexpr at compile time to figure out what the final struct is supposed to actually look like.
I'm not sure if integration with threadpools should be in scope of this library. I mean, if there's a library that can read bunch of formats, combine different sources together and spits out parsed data structure, that's great. Should watching files and realoading be done in the same library, or in some that builds on top? I don't really want to pull in the whole tokio to read a 20 lines long config file 😈.
To write to it you just set properties on the struct and call .save() which it and the drop impl can writeback changes with if the struct derives serialize.
That sounds a bit scary. First, I believe most applications don't modify its own configuration. At least servers don't. And it seems to be not obvious where it would store the changes if, for example, the whole config was read from the environment variables. And if save takes a path to the file, does it provide anything on top eg. serde-json, if you structure already implements Serialize
?
I don't see what exactly you want to do in drop
, but saving configuration is a fallible operation and there's no way to signal these from within destructors, so such pattern is generally discouraged (you can either panic, which is probably the wrong thing to do, or ignore the error, which is definitely not a robust design).
So, in general, the composition by the builder pattern seems nice (nicer than the fn merge(&mut self, …)
thing), but would need some details ironed out.
I also wonder what scope the library should have. For me it would make sense to split the parsing/merging/decoding of configuration from other features, like file change watching (both because one might not to want it, and because there's a million ways how you may want to trigger your config reload, or where it happens and how the changes are propagated back into the program; as the author of spirit, I know how much complexity there is :-| ).
It would need to derive to support adding the fields to support watching, etc.
Why? The way I imagine watching working, is when a fule changes, it just reloads the full config.
Save on close is just something a lot of other config frameworks do because in practice users will generally forget to save (or in something like GUI configuration not have to save on every user change).
Save on close doesn't necessarily require save on drop. And having the save method separate givea the developer more control, and the ability to handle errors.
I see, so the purpose of the derive is to store additional state in the config state? Couldn't the same thing be accomplished with a wrapper Config<T>
struct that implements Deref<Target=T>
and possibly DerefMut
?
You don't even need this in the crate either since an end user is defining the struct and can thus impl drop themselves if they want it.
Exactly. If that is desired behavior, the user of the library can just implement Drop
on their config struct themselves (or if a wrapper type is used, as I mentioned above, implement it on Config<MyConfig>
).
First, AFAIK, derive macros don't allow you to inject more fields. They allow you to add code, but outside of the struct (eg. make impl
blocks).
Second, why would I ever want, as the user of the library, to intermix the config structure (the data loaded from the config file) with the manager? Shouldn't the manager be some kind of object that does the tracking and provides me/contains/derefs to the actual config I've decided to use? Putting both into the same thing conflates two things into one. I can certainly want to pass a config structure to a function, but not pass the manager into the same function.
A little off-topic, but how about a macro that embeds the configuration provided at compile-time into your code? Essentially, during compile time (perhaps in CI), the configuration can be provided through the same merge-semantics (and combining multiple configurations) and provided as a constant struct, making value accesses hard-coded? Idk just a random idea. Maybe that combined with env vars during build time would be useful in a few use-cases (especially in embedded scenarios), considering runtime performance would be improved significantly, since the compiler can now optimize for constant values.
// load configuration from arguments
let settings: Settings = config::env::args(clap_matches).load();
+1 for this one. I found re-creating large parts of config.rs manually to achieve clear precedence rules flags > env > toml.
Adding to this issue, but I might also open a separate one. I've just found out about this crate, and although it looks like it covers part of what I need, I'm actually put off by the sheer number of mandatory dependencies. The support for the various serialization formats are behind Cargo features as one would expect, but it seems (or did I misunderstood something?) that the various sources are all mandatory. In particular, web/online ones like HTTP via reqwest
are almost a deal breaker; many applications either do not want any network connection, or do want absolute control over them due to privacy or security concerns (especially when dealing with user settings), or have special requirements in terms of how networking can be done. Same for disk access in some contexts. The result is that this crate depends on a long list of crates to provide various sources that cannot be disabled, for something that is not its main role (I'd assume it's a config management and serialization/deserialization library, not a networking one or a filesystem abstraction one). I've yet to decide if I'm going to use this, some part of it, or just rebuild the config management part (the core responsibility of this crate) just to be able to rip off all the clutter around the various sources I don't want/need.
I moved this issue to the project so we can consider it as well (see here for more details).
If you, as a commenter in this issue, want to contribute to the effort of re-thinking config-rs, please open issues for your ideas (see the "Help wanted" section here).
Two years later! - How're the plans progressing?
Hi, just wanted to add that allowing Humanized numbers would be great, maybe leveraging the parse_size crates parse_size function to convert 1KB into 1024 etc
This library has gotten quite stale over the last year and I've been loathe to come back to it mostly because I don't like it. It's complex to use, complex to develop, etc. I apologize for letting it atrophy.
I'd like to thank you everyone who has contributed or submitted a PR ( especially those who've done so in the last several months and have just sat there ). It's not that I don't want to merge in your work, it's that if I come back to this I'll want to do too much and I haven't had the time available to do that.
With that out of the way I want to explore how a configuration library could behave in Rust, optimally, using some example use cases as a jumping off point:
Thoughts? How would you want
config
to work?Are there any pain points in the current design you'd like to discuss?