What do we think of separate configuration files / general configuration improvements?

rwf2 / Rocket

A web framework for Rust.

https://rocket.rs

Other

24.4k stars 1.56k forks source link

What do we think of separate configuration files / general configuration improvements? #145

Closed mehcode closed 5 years ago

mehcode commented 7 years ago

I see that I can do what I want with rocket::custom but I'm wondering if @SergioBenitez and contributors would like this as well (as its still definitely early enough to change).

At a high level, what I would want is to be able to have N configuration files (a default and 1 per environment), instead of a single configuration file. Some reasons for this:

Looking at Rails and Django, it seems this kind of idea is widely used
Production/Staging configuration could be encrypted or generated from environment variables inside of Docker startup script
Most configuration options I can imagine apply to all environments with just some tweaking per environment (which is why something like config/default.toml is a useful concept.

We can support this a few ways (that I can see):

Allow specifying relative-from-cwd globs of configuration files.. with_config("config/*.toml") or with_config("Rocket.toml", "Rocket.production.toml"). This could default to "Rocket.toml" to keep compatibility.
Allow specifying a configuration path.. with_config("config"). This wouldn't be able to be defaulted to . as it would then pick up Cargo.toml.

My vote is for 2 (config path) and default it to config.

The actual format of the files (environment tables) can stay the same or environment can be taken from the filename. I'd prefer the former as it'd be more flexible.

Some more things of interest to configuration (but can be tackled later):

Ability to pull configuration from environment
Ability to pull configuration from etcd (or similar services)

Another note is that a (12-factor) configuration library would be of general use to Rust and would probably be best served as a crate that is then used by Rocket. I don't mind jumping on that if come to a consensus here. Just help me come up with a name. config is taken by an unfinished/unmaintained libconfig parser (maybe we can ask @filipegoncalves to rename it to libconfig or ask crates.io if unresponsive).

For reference I really like how https://github.com/lorenwest/node-config just works.

SergioBenitez commented 7 years ago

Thanks for the issue. I'm all about things that just work, and I'd like for Rocket's config feel that way. I want to address some of the issues you've raised and then talk about the outlook of config in Rocket more broadly.

Looking at Rails and Django, it seems this kind of idea is widely used

Besides security and best practices, Rocket doesn't take many queues from other web frameworks. While it's important to learn from the successes and failures of others, it's also important to try to approach problems with a breath of fresh air. With that in mind, I don't believe this is a compelling reason to do things one way or another.

Most configuration options I can imagine apply to all environments with just some tweaking per environment (which is why something like config/default.toml is a useful concept.

Rocket already has the global environment, which accomplishes something similar but in a sort-of reversed manner. Maybe global should be default instead, but this, I feel, is a bit confusing when there are already defaults that Rocket enforces. The defaults are chosen to be sane so that the Rocket.toml file is only the set of extra or overridden parameters, as special cases.

Production/Staging configuration could be encrypted or generated from environment variables inside of Docker startup script.

Configuration via environment variables has already landed in master (https://github.com/SergioBenitez/Rocket/commit/d4d5c5dd29b40e9c563197490adce503c1370be8) and will ship in 0.2. How common is encryption of configuration parameters? If only one or two parameters are being decrypted by some application environment, then setting the parameters via environment variables seems like the way to go.

I'm personally of the opinion that having all configuration in one file is a superior design; centralization of configuration - being able to see exactly how the application is configured in one place - is something I find particularly useful when developing applications. I would be remiss to 1) force users to have a config directory with several files, and 2) complicate the configuration story beyond a single file. At this moment, I don't feel you've presented a compelling reason to go the multi-file config route.

That being said, one possible argument for multi-file config is that you can easily exclude specific configurations from git source control. This helps, for instance, if you want to open-source an app that uses a session key but you don't want to share the one you're using with the rest of the world. Or, if you have database configuration that requires a username/password. Without multi-file configs, possible workarounds are to 1) commit the proper hunks of the config file, 2) use environment variables for the sensitive parameters, or 3) simply don't commit the Rocket.toml file. The first is asking for too much, but the second two seem like fair alternatives.

Anyway, to summarize in a different manner: we should be as discerning as possible when contemplating adding complexity. The reasoning needs to be compelling enough to admit the complexity. I'm wary that such reasons exist for this particular case, but as always, I'm very open to feedback.

mehcode commented 7 years ago

Besides security and best practices, Rocket doesn't take many queues for other web frameworks.

I'd argue that configuration handling falls under "best practices".

I'd further argue that if a framework is as large and as used as something like Rails or Django then the "best practices" defined within their communities are very well defined / make sense.

I'm not saying that this applies to all things. But if you look at configuration and how the communities of Django, Rails, Laravel, and Node all seem to agree that configuration should be broken out into multiple files depending on environment.. it's a bit telling.

Rocket already has the global environment, which accomplishes something similar but in a sort-of reversed manner. Maybe global should be default instead, but this, I feel, is a bit confusing when there are already defaults that Rocket enforces. The defaults are chosen to be sane so that the Rocket.toml file is only the set of extra or overridden, as special cases.

Currently configuration files are "Rocket" configuration files. That's fine. However.. currently you can define arbitrary configuration inside of Rocket.toml which I'm sure will be (ab)used for all sorts of configuration a typical server application needs. Connection parameters, random keys, etc. This is no longer a "Rocket" configuration. This is a "general" configuration.

I would strongly argue to remove Rocket.toml and do two things.

Add an optional with_config / config / configure method (or just have people use rocket::custom) to override rocket-specific configuration (there isn't much so eh).

rocket::custom(rocket::Config{
   })
  .mount("/", routes![...])
  .launch();

Let's work together to define a general application configuration library that rocket simply uses.

This hypothetical configuration library would be useful for dozens more use-cases than just Rocket. I'm sure we can come up with a solution that satisfies all parties here.

Here's an idea that should address your default 1-file concern.

The config crate reads a Config.toml file at CWD or it reads all *.toml files in a config directory. The format is identical to what you have now.

Perhaps you could configure the config create in the global (non-table) space of the Config.toml as well.

I'm personally of the opinion that having all configuration in one file is a superior design; centralization of configuration - being able to see exactly how the application is configured in one place - is something I find particularly useful when developing applications.

I've put dozens of web applications in production. What works best for me may not be what's best for you. But what is best for me is to keep default, development, staging, and test configuration files normally in git; and, keep a production configuration file, encrypted in git as well. In many projects I only have default and production. My applications are self-contained (all config is in the repo), easily build-able into a container (as long as you have the secret key), secure (most developers don't have access to production config), and run without the system administrator needing to know to add any env vars to the command line.

Without multi-file configs, possible workarounds are to 1) commit the proper hunks of the config file, 2) use environment variables exclusively, or 3) simply don't commit the Rocket.toml file. The first is asking for too much, but the second two seem like fair alternatives.

I don't like 1 because its strange. You'd need to use environment variables in a Dockerfile to append the rest of the config. Also what happens in development.. just weird.

2 is unwieldy and would lead to me making a configuration library like dotenv and just linking it together.

3 is awful for development again.

I want my configuration to be self-contained and secure as I explained above.

Anyway, to summarize in a different manner: we should be as discerning as possible when contemplating adding complexity.

Totally agree. When it comes down to it.. my proposal would only affect the users of Rocket by having them change Rocket.toml to Config.toml (and even that could probably be taken care of as a "possible file name" by the usage of the config crate inside of Rocket).

impowski commented 7 years ago

I would agree with some things, because having a multiple files for configuration would be kinda good, because for instance when we have a variable like DATABASE_URL which will connect with Diesel and use our DATABASE_URL it would be very useful to have a Rocket.test.toml so when I run cargo test it will grab DATABASE_URL from Rocket.test.toml, but at the same time I can just have a [test] inside of Rocket.toml and it will be the same.

Right now it's seems kinda useless, because we will need multiple config files if we will have more things to configure, like some contrib things or maybe something else which will require a lot of variables to configure, then yes.

I think it would be a good idea to accept this issue as an enhancement and put it a little bit away for some future releases when, because having one file is enough for now.

And I personally don't really like to think about idea of having a 4 .toml files with 5 lines inside of them instead of just 1 file with 20 lines.

mehcode commented 7 years ago

And I personally don't really like to think about idea of having a 4 .toml files with 5 lines inside of them instead of just 1 file with 20 lines.

I want to make it clear that this is not what I'm proposing. I'm proposing a combination of what is done now and what is done in other frameworks.

`Config.toml` or `config/default.toml` or ...

[development]
api_url = "http://localhost:8080"
api_key = "b"

[production]
api_url = "https://cool-api.io"

`Config.secret.toml` or `config/secret.toml` or `config/production.toml` or ...

[production]
api_key = "fdjlfu94uc89refkfkrf9rk9f0rw"

The idea is to merge configuration files together and still use the same tabular grouping we use today. This way one can be as segmented or as not as they want.

To be more general. I want rust-config to support a merged configuration from N sources. Where a source is a file, etcd, environment variables, etc (this is the same approach used by the popular go configuration library, viper)

mehcode commented 7 years ago

Right now it's seems kinda useless, because we will need multiple config files if we will have more things to configure, like some contrib things or maybe something else which will require a lot of variables to configure, then yes.

Most of my web services talk to several APIs, Redis, Postgres, RabbitMQ, etc. The configuration adds up.

impowski commented 7 years ago

@mehcode It's good that you made it a little bit clear. I actually forgot to mention that I really like an idea of encrypting config files. I guess that config thingy will made it, but not right now, especially if the project will grow faster.

mehcode commented 7 years ago

@SergioBenitez Library is about half-done but the Readme explains should enough

https://github.com/mehcode/config-rs

The idea is Rocket would use this library under-the-hood (and re-expose it at rocket::config).

The average user could simply ignore that config-rs is used and go about as normal.

A user like me could use config::merge to add in other configuration sources around what you would (the Rocket.toml).

mehcode commented 7 years ago

@SergioBenitez Are you a maybe on this? I'm about to start on a PR to rip out rocket::config and replace with this and am wondering what you're thinking about it.

SergioBenitez commented 7 years ago

@mehcode I am uneasy about the idea in general, and even more uneasy about using your library in particular. Don't get me wrong: this is not a statement about the quality of your code in any sense. Simply, configuration in Rocket is something that has to be right. No one wants to deal with confused configuration. This is why it is one of the most tested parts of Rocket. I would not be comfortable using anything that isn't tested thoroughly.

Furthermore, your library doesn't seem to expose hooks to give good error messages. This is actually something that the toml crate itself makes difficult. Consider, for example, what Rocket does now if you set port = "8000":

Error: 'development.port' key could not be parsed
    => in "/rocket/examples/config/Rocket.toml"
    => expected value to be an integer, but found string

Can I get these types of errors with your library?

In short: I should give up nothing when moving to any other solution.

mehcode commented 7 years ago

I am uneasy about the idea in general, and even more uneasy about using your library in particular. Don't get me wrong: this is not a statement about the quality of your code in any sense.

Fair enough. My goal here is to provide a configuration solution for Rust that is general enough to be used in as many situations as possible.

In short: I should give up nothing when moving to any other solution.

I can agree that you want to have the best possible configuration solution. I get that. But I'm sure you understand how much better it would be if there was a configuration solution for Rust (and not strictly Rocket). Big libraries like Diesel could expose a way to easily construct themselves from it. Currently your "extras" solution means a library has to be made ala rocket_diesel to provide that binding as its rocket-specific.

Now to answer some concerns.

Furthermore, your library doesn't seem to expose hooks to give good error messages. [...] Can I get these types of errors with your library?

I'm going to counter this with a question of my own. Why do you want type errors from your configuration?

Configuration can and should come from dozens of sources:

Files of various formats (many of which have no concept of types)
Remote sources: etcd or Consul (no idea on Consul but etcd has no concept of types)
Environment variables (again no concept of types here)

How do you have type errors over this while keeping the configuration source flexible?

For instance.. if I define DEBUG=1 in my environment.. as environment variables are strictly strings, according to your example I'd need to config::get_str("DEBUG") and deal with the "1" myself (where as in my opinion a smart configuration library would instead allow config::get_bool("DEBUG")).

This is not to say other configuration errors are not important. The library is barely a day old (and I don't expect you to merge the PR until the library is at least minimally complete).

The library currently has:

Configuration file not found — returned from config::merge only if the file is marked as required
TOML parse error in Rocket.toml — returned from config::merge
Key not found — config::get returns an Option<T>

Are there any other configuration type errors that should be thrown?

I'm not saying the configuration errors read well. They're essentially just error pass-through. That can easily be improved though.

SergioBenitez commented 7 years ago

Currently your "extras" solution means a library has to be made ala rocket_diesel to provide that binding as its rocket-specific.

This is simply wrong. You can include Rocket as a dependency and then use rocket::config methods. From what I've seen, this is no different that what I'd need to do with your library.

I'm going to counter this with a question of my own. Why do you want type errors from your configuration?

These are parse errors, where some strings are treated as "integers" and others as "strings" and still others as "bools" and so on.

For instance.. if I define DEBUG=1 in my environment.. as environment variables are strictly strings, according to your example I'd need to config::get_str("DEBUG") and deal with the "1" myself (where as in my opinion a smart configuration library would instead allow config::get_bool("DEBUG")).

Have you looked at the way Rocket handles configuration from the environment? That's not what happens now, and that's not what my example shows. Again, every string is parsed as a value of some type. That's the type you'd need to request. In your case, DEBUG=1 would be parse 1 as an integer value for DEBUG. You'd ask for an integer.

mehcode commented 7 years ago

This is simply wrong. You can include Rocket as a dependency and then use rocket::config methods. From what I've seen, this is no different that what I'd need to do with your library.

I don't mean to be snarky but this seems obvious. There are going to be dozens of web frameworks / libraries in Rust. Rocket is not the end all. It's just the life of such a large project. There are too many decisions made to satisfy everyone.

A configuration library, on the other hand, has a small enough surface area that it can feasibly support 90-99% of all users. Including a strong configuration library as a dependency is a much easier sell to a large project then including a dependency to a web framework.

Have you looked at the way Rocket handles configuration from the environment? That's not what happens now, [...]

No, because I didn't realize it did. The docs don't mention it (from what I can tell).

That's not what happens now, and that's not what my example shows.

Ok.. It sort of does seem like that in your example to me. port = "8000" is not being parsed as an integer but as a string. I'm assuming that this is because its in TOML which has blessed types.

These are parse errors, where some strings are treated as "integers" and others as "strings" and still others as "bools" and so on. In your case, DEBUG=1 would be parse 1 as an integer value for DEBUG. You'd ask for an integer.

This seems a bit restrictive to me. Taking a look at what you do now.

It feels a lot more flexible to allow DEBUG=1 to be boolean (old convention of env vars is for 1/0 to be true/false for instance) or LATITUDE=52 to be f64.

Another use case. Perhaps I'm dealing with a library that takes ports as strings because its dumb. I have some_lib_port = "9090" defined in my Rocket.toml. I then do SOME_LIB_PORT=80 in my environment. It feels (I have not tried this) that your library would raise a BadTypeError in my code that was working with config::get_str during development but now fails in production.

What do you lose if we allow the consumer to decide the type of a configuration value vs. the configuration itself?

Note that you still get what you call parse errors in my approach. If you do config::get_int("address") and the environment has ADDRESS=::1 then you get None from get_int currently. We could extend this to a Result that has a ParseError and a NotFoundError easily enough.

SergioBenitez commented 7 years ago

I don't mean to be snarky but this seems obvious. There are going to be dozens of web frameworks / libraries in Rust. Rocket is not the end all.

What's the point of saying this?

A configuration library, on the other hand, has a small enough surface area that it can feasibly support 90-99% of all users. Including a strong configuration library as a dependency is a much easier sell to a large project then including a dependency to a web framework.

I'm not sure how familiar you are with the Rust/Cargo ecosystem, but Cargo supports conditional compilation through features. The proper thing to do for some large project wishing to support Rocket in some way is to add a feature that includes Rocket as a dependency only when enabled. Since the user would only enable this feature when they're using Rocket, this would result in no additional dependencies in all cases.

No, because I didn't realize it did. The docs don't mention it (from what I can tell).

This is only in master. I provided a link in my initial comment to the commit adding this support. The commit also adds documentation. Here it is again: https://github.com/SergioBenitez/Rocket/commit/d4d5c5dd29b40e9c563197490adce503c1370be8.

Ok.. It sort of does seem like that in your example to me. port = "8000" is not being parsed as an integer but as a string. I'm assuming that this is because its in TOML which has blessed types.

That's right. I was using port = "8000" as if it was part of a TOML file.

It feels a lot more flexible to allow DEBUG=1 to be boolean (old convention of env vars is for 1/0 to be true/false for instance) or LATITUDE=52 to be f64.

There's nothing preventing the user from doing this for extras. They can simply try to get a bool, and if that fails, try to get an int and do the conversion.

Another use case. Perhaps I'm dealing with a library that takes ports as strings because its dumb. I have some_lib_port = "9090" defined in my Rocket.toml. I then do SOME_LIB_PORT=80 in my environment. It feels (I have not tried this) that your library would raise a BadTypeError in my code that was working with config::get_str during development but now fails in production.

It doesn't raise a type error, it just returns None. The user can handle this as they want.

What do you lose if we allow the consumer to decide the type of a configuration value vs. the configuration itself?

Rocket only enforces typing for its own parameters. Any other library can accept whatever they want, with the knowledge that strings will be parsed in a particular way.

Note that you still get what you call parse errors in my approach. If you do config::get_int("address") and the environment has ADDRESS=::1 then you get None from get_int currently. We could extend this to a Result that has a ParseError and a NotFoundError easily enough.

I've lost a lot of information here. Where is address coming from? What happened? Why isn't it an int? Rocket gives very particular error messages indicating where the error occurred, why, and what was expected.

SergioBenitez commented 7 years ago

The main argument towards allowing multiple files is to protect sensitive information from leaking in some way. As I stated before, one way to prevent this already is to set configuration parameters via environment variables. The opposition to this idea is that one would like to set parameters via a file.

Here's a solution that meets all of the requirements. Create a file with whatever name you'd like; let's call it secret.conf.env. Inside, set parameters as if they were environment variables:

ROCKET_PORT=80
ROCKET_SESSION_KEY="my_secret_key..."
ROCKET_SECRET="another secret"

Finally, simply load the variables from the file when launching the Rocket application:

export $(cat secret.conf.env) && ./my_rocket_app

How does this solution sound to folks?

shssoichiro commented 7 years ago

I have a couple of criticisms of this approach. First, I'd like my configuration to be local to the app only. That is, export applies the environment variables to any future programs in that shell. That being said, I'm sure there's an easy way to work around that by writing the above command a little differently.

The other is that this is an extra step, albeit a small step, for loading the custom configuration, and it's a step that is unique to Rocket. It will need to be documented and easily discoverable.

Although I don't think the above are major issues that should block this, my opinion is that it would be better to use the dotenv library to load environment variables from a .env file automatically when Rocket starts. This doesn't require any changes to the command line to run the program, and it's a convention that's widely used across many other programming languages and frameworks.

marcusball commented 7 years ago

I like where this discussion is going, but I think the first question that really needs to be answered is "what is the intended scope of this configuration?" I'm sensing that some of the views here so far have essentially been seeking to use the proposed configuration solution as the configuration for the entire application, not just logic related to Rocket. Is this the desired goal? Including business logic and things like secret keys seems to really blur the lines of what Rocket is trying to achieve.

If we do seek to create a solution that would be capable of supporting the configuration for an entire application, then I think this should be separated into a separate crate, be it rocket_config or a completely independent, non-branded option. Further, while I like what you've done so far @mehcode, I think we may want to actually make an empty repo and do a formal RPC-style approach to designing a library so we can actually collect desires and use-cases and then design around solid, well-agreed-upon feature and functionality.

If this proposition is limited to Rocket-related functionality, then I think I'd support the more simplistic approaches that have been brought up, namely keeping it all in a single file; I think I like the type-safe style more for this case.

mehcode commented 7 years ago

Two distinct things are getting conflated here. It's my fault as I keep getting them mixed up as well. This should probably be two issues.

First is config-rs. I'm working on a generic and configurable configuration library. The idea is that you should be able to express any configuration setup using its primitives.

A framework such as Rocket need not expose anything about it. The benefit Rocket would receive is easy support for numerous configuration formats, remote configuration (etcd, consul), and the general concept of layered configuration.

As long as we build up config-rs enough, there shouldn't be any reason why Rocket couldn't use it under-the-hood to build whatever configuration story it wants.

As an aside, the examples on the config-rs repository use the global configuration. The library also supports instanced configuration.

@marcusball If you're interested in the direction of config-rs, please come over and start slinging words around. The library is at a very early state right now and hasn't had its first release yet so things are very malleable.

Second is exactly how Rocket interacts with configuration (single file, multiple, some .env approach, etc.) and what is the scope of its configuration.

As to the scope question, currently, Rocket exposes the concept of extras in its configuration object. This map is all additional key/values that are not recognized by Rocket. I believe the intent is for frameworks such as diesel to tap into Rocket's configuration via feature flags and "auto-configure" themselves. To answer your question @marcusball, this would make it "whole application" configuration.

My use case is satisfied with the .env approach. It feels a bit "awkward" as its not self-contained in Rocket, but eh. My use case is for separating out sensitive production configuration and encrypting the file. The strange run command would only live in my Dockerfile so it's not like this would bother me every day.

The ideal way to do this by the way is:

env $(cat .env | xargs) ./target/release/example

env passes a list of key=value pairs to the final argument, the binary
cat .env reads the environment variables from the file which can be 1 key=value per line
| xargs joins the lines by a single space as env can't deal with the newlines

SergioBenitez commented 7 years ago

Thinking about this a bit more, does simply symlinking solve this problem? You can create as many configuration files as you'd like and then simply symlink the proper one to the directory where the application will run. This lets you split up files exactly as you'd like while still using the Rocket configuration format.

Edit: Actually, here's a solution everyone might be okay with: what if Rocket simply allowed you to change the path to the configuration file via an environment variable? Then you could have as many files as you'd like and change it via something like: ROCKET_CONFIG_FILE=configs/prod.toml at start-up. The only downside to this approach is that you may have to repeat parameters across configurations.

mehcode commented 7 years ago

After playing around with Rocket in a full setup for a bit, here's how I do it.

./Rocket.toml contains configuration for [development] and [staging]
./Rocket.production.toml contains only configuration for [production] (and is encrypted)
During a production deployment:
- ./target/release/<app> is copied as /srv/<app>/<app>
- ./Rocket.production.toml is copied as /srv/<app>/Rocket.toml

That setup works well enough for me. It's a bit annoying that if there is configuration between development and production that it must be repeated, but its not that bad.

On another note. I usually have local-only configuration as well. Like proxies for services that are on a IP whitelist. But environment variables work well for that. For instance: PROXY="http://localhost:3128" cargo run

SergioBenitez commented 5 years ago

Let's close this in favor of #852, which should resolve all of these issues.

rwf2 / Rocket

What do we think of separate configuration files / general configuration improvements? #145

Config.toml or config/default.toml or ...

Config.secret.toml or config/secret.toml or config/production.toml or ...

`Config.toml` or `config/default.toml` or ...

`Config.secret.toml` or `config/secret.toml` or `config/production.toml` or ...