rust-cli / team

CLI working group
https://rust-cli.github.io/book
MIT License
294 stars 34 forks source link

Config file management #7

Open spacekookie opened 6 years ago

spacekookie commented 6 years ago

(moderated summary by WG)

Context

Plan of Action

In-ecosystem resources

External inspiration

Challenges

@spacekookie's original poist

In the first meeting, we discussed the task of file management on different platforms (particularly configurations) and how it can be made better.

@Eijebong summarised it like this

If I need a config file, I don't want to know that it should be ${XDG_CONFIG:${XDG_HOME}/.config:/home/${user}/.config on linux, %AppDir%/App/ on windows and something else on osx [...]

There is a crate for "determing system configuration" (app-dirs-rs) but it seems unmaintained and not up to date

killercup commented 6 years ago

The configure crate by @withoutboats abstract about configuration in general (as the name suggests), but doesn't seem to have an adapter for configuration files right now (except for Cargo.toml, but that's not the use case we have in mind I guess).

kbknapp commented 6 years ago

I think it'd good to lay out some requirements for a crate(s) that would fill this gap:

I'd be OK with having multiple sub crates for various platforms, and then a "parent" crate that allows abstracting over the platform where all I do is specify a file name I want to load/save.

kbknapp commented 6 years ago

Bonus if the crate can allow me to search custom directories, or tweak the order (on platforms where applicable).

withoutboats commented 6 years ago

but doesn't seem to have an adapter for configuration files right now (except for Cargo.toml, but that's not the use case we have in mind I guess).

Nope, the configure crate's default "source" is definitely designed for use cases where the person configuring the application is also the author - such as network services. However, the intent is for libraries to use configure, so that the application author can have total control over the source of configuration.

A configuration source that integrates with configure and is designed for CLIs would be a great addition, and possibly one I'd be interested in upstreaming into configure proper.

yoshuawuyts commented 6 years ago

I'm thinking this issue might be part of a bigger topic.

It's probably uncommon for a single application to use all of these, but one or more should be common enough. It feels like these questions are part of the same problem; perhaps it might be useful to consider all of these questions as part of this discussion?

spacekookie commented 6 years ago

For the question of temporary files there is already a crate which seems to do its job quite nicely (though I've only used it in limited scenarios so far, maybe it can be improved!)

As for the rest…I think it would be pretty cool if we could create (or find and improve existing) crates that mirror the same behaviour for other configuration, essential and non-essential data files as well.

It should be as simple as saying Configdir::new("my_app_name") and being able to write and read configurations from it.

Edit Just as I hit "Comment" I found this crate here

killercup commented 6 years ago

Hi, @soc! The Rust CLI working group is talking about cross-platform configuration file management and your directories crate has come up. Looking at your Github profile, I see you have a Java directories package as well, so you seem have some expertise in this area. Wanna chime in here? :)

soc commented 6 years ago

@killercup Sure, how can I help?

killercup commented 6 years ago

@soc awesome! We were currently doing some research about the status quo of crates that are useful when writing CLI tools, work cross-platform and are maintained. For example, we want to come up with a good story around how to easily configure a CLI tool—with config files, env vars, and CLI flags. This issue is focussing on the handling of config files. @kbknapp already listed some good requirements in https://github.com/rust-lang-nursery/cli-wg/issues/7#issuecomment-367085114.

Do you think directories is a good foundation here? What are your plans for it? Can we help you get it to 1.0? :)

(@spacekookie and @yoshuawuyts probably have more to say!)

soc commented 6 years ago

For example, we want to come up with a good story around how to easily configure a CLI tool—with config files, env vars, and CLI flags.

directories is intentionally focused solely on dealing with operating system defaults. The reasoning for this is not because I believe that other venues for configuration are not important, but to provide the most minimal, focused and stable API I can get away with.

For instance, when dealing with CLI flags, the first issue you have is that of style (-h and --help vs. -help; -xyz vs -x, -y, -z; key=value vs. key value; and that's just Linux/macOS ... Windows has its own, different rules with /h etc.). There is potentially a lot of complexity and moving parts involved when trying to provide an CLI interface that makes everyone happy.

Do you think directories is a good foundation here?

I do think that directories is a good foundation for dealing with the operating system standards part of your goals.

I believe that dealing with CLI flags should probably be done in a separate library, or in a way more specific to the individual application's needs, because dealing with CLI flags is very application-specific.

In the end individual applications already need to have some custom code anyway to deal with migrating from storing their data directly in $HOME to following the platform standards. Dealing with CLI flags will probably be the same.

That's why directories only tells developers which directories they should be using, but does not get involved with creating directories itself, or making decisions about the priority of multiple directories (for instance platform defaults vs. CLI flags vs. config files).

Application-specific code will be required to handle such issues, and I want directories to avoid getting involved in that: Often the cost of complexity to solve such issues in a general fashion in a library is way higher than dealing with it on the application side, especially when handling (legacy) applications with their own folder in $HOME – without breaking things for existing users.

Here is an example of an application that makes use of directories (the JVM version) and deals with migration compatibility, property files, and application-specific env vars: https://github.com/coursier/coursier/pull/676.

What are your plans for it? Can we help you get it to 1.0? :)

My plan is to declare it as stable as fast as possible. I think the main blockers are

soc commented 6 years ago

I have created tickets for the remaining issues I mentioned: https://github.com/soc/directories-rs/issues/1 and https://github.com/soc/directories-rs/issues/2.

soc commented 6 years ago

A more general note: There is a vast difference between selecting and standardizing on crates that provide certain functionality (like CLI parsing, config file parsing) and having one standardized way of handling application configuration:

With the former you probably get crates that do almost everything and allow configuration of almost everything.

With the latter, you want to be highly selective and make actual choices how things can be specified, and not allow a free for all in terms of decisions a developer can make.

killercup commented 6 years ago

As you've noticed, I've opened some issues at directories-rs. I'd hold off on releasing a 1.0 before there are some consumers of the crate.


There is a vast difference between selecting and standardizing on crates that provide certain functionality (like CLI parsing, config file parsing) and having one standardized way of handling application configuration

Absolutely. We already have some great libraries for CLI args, and I'd love to have an equally as good story for dealing with config files. That is not one crate – it's several build on top of and complementing each other :)

(We'll hopefully see more concrete proposals for this in #6!)

spacekookie commented 6 years ago

I think the focus should be less on a config file format and more on an API to get to those files. As a developer I might still want to be able to chose a format, say json or toml or ini via whatever serde backend exists to read/ write my configuration files. But I don't want to have to worry about where to put it.

Not sure why you brought up CLI parsing. Although thinking about it now, I'm not sure how clap.rs handles windows arguments :sweat_smile:

I haven't had a chance to play around with your crate yet but from the README it looks like it already exposes pretty much all the directory paths we might be interested in. At that point it becomes a question of making the API more ergonomic. i.e. maybe there could be a function to easily list configuration files for the given application (or None if there are none), etc

killercup commented 6 years ago

Not sure why you brought up CLI parsing.

I brought it up, sorry :)

So, I've been thinking about what an all-around config solution might look like. We should not implement such a thing right now, but discuss what needs to happen to get there!


Here's a small proposal that integrates ideas from clap (v3, this is future!) and configure to get the discussion going:

#[derive(Debug, Deserialize, Clap, Configure)]
#[config(prefix = "diesel")]
struct Args {
    #[clap(short = "q", long = "quiet")]
    quiet: bool,
    #[clap(long = "database-url")]
    database_url: bool,
    #[clap(subcommands)]
    command: DieselCliCommand, // an enum defining subcommands with their own fields and attributes
}

fn main() {
    let args = Args::configure()
        .read_from(configure::adaptors::config_file::toml("diesel_cli.toml")) // Invokes serde
        .read_from(configure::adaptors::env_file()) //  dotenv
        .read_from(configure::adaptors::env()) // std::env
        .read_from(configure::adaptors::clap_auto_init()); // Clap incl. early exit on `-h` and stuff like that
}

You can then:

Is that approximately the direction in which you want to go? What needs to happen to get there?

TeXitoi commented 6 years ago

I think the CLI/conf/env story should be in another issue.

killercup commented 6 years ago

Sure, that was just for inspiration and to set some context. (If you have other use cases/ideas, please tell us :))

TeXitoi commented 6 years ago

I have a couple of request in the structopt issues about that (no ideas, but persons wanting something like https://github.com/rust-lang-nursery/cli-wg/issues/7#issuecomment-367673115)

kbknapp commented 6 years ago

Like @spacekookie said, I think it should focus on abstracting over platform specific issues and not on the format, or providing "key->value" style API.

As the application writer, I want to just specify a file name, and let this crate handle where to store it. I then worry about formats, reading/writing, etc.

Then later on someone could write a generic crate to abstract over this configure crate, using something like serde to give a key->value style API.

Here's how I see the crate structure playing out (note, the crate names are just generic and not referring to anything existing right now).

config

Screwtapello commented 6 years ago

At a former employer, I wrote a config file management library (in Python) that turned out to be popular with my fellow developers (because it was easy to add to an existing project) and with our operations staff (because all our tools worked the same way, and the configuration was flexible enough for most of our use-cases). It worked like this:

Pros:

Cons:

If I were to attempt something similar in Rust:

Screwtapello commented 6 years ago

I think the focus should be less on a config file format and more on an API to get to those files.

One reason to consider a standard config file format, or at least a standard config data model: on Windows, perhaps the standard configuration source could/should be the Registry, rather than the filesystem?

soc commented 6 years ago

After some research on that, it seems that most developers recommend and prefer files over the registry:

derekdreery commented 6 years ago

Since this thread is about the location of config files rather than their contents this may be a bit off topic, but here goes anyway:

Similar to how structopt works, I'd love to do

#[derive(Structconfig)]
pub struct Config {
    timeout: u8
    #[structconfig(name="retries", default=3)]
    no_of_retries: u8,
    files: Vec<PathBuf>,
}

and have all the config stuff taken care of for me!

Edit:

I'd try to find a way to build it around a configuration schema object (following the model of serde and structopt) instead of tossing around raw config files

Didn't see that it had already been suggested.

derekdreery commented 6 years ago

@Screwtapello

How did your code deal with first run, if there wasn't a config file? Did it assume you wanted to use the defaults, or did it exit and prompt you to create a config file? (or did it walk you through creating the config file interactively?)

Screwtapello commented 6 years ago

@derekdreery

At first run, it would use the defaults. For the various tools we created, every config option always had a sensible out-of-the-box default. Things the program absolutely could not know without asking would generally be command-line arguments, not config options.

It's a big world, and I'm sure there's some potential config options that cannot possibly have a sensible default, but I can't think of one right now. If anyone has an example, I'd love to hear it.

epage commented 6 years ago

Another aspect of config management to consider is passwords. Looks like there is a keyring crate that could use some polish and advertising.

kalefranz commented 6 years ago

Hey everyone. I'm the current dev lead of conda, which is a cross-platform, system-level package manager. Currently written in python--but we're in the initial stages of considering transitioning key pieces to rust.

Just wanted to add to this discussion how we do configuration, because it's been powerful and has worked out very well. It's also very similar to what @Screwtapello described.

For each invocation of our executable, we build up a configuration context object from four sources of configuration information:

  1. hard-coded default values
  2. (potentially multiple) configuration files, including support for files in ".d" directories
  3. environment variables
  4. command line flags

These are linearized in a way that the configuration sources conceptually closest to the process invocation take precedence. That is, if a configuration parameter is provided as a CLI flag, but also provided in a configuration file, the CLI-provided value would win. I guess the insight here is that most CLI applications deal with at least one configuration file, environment variables, and CLI flags anyway, and we've just realized that they all represent basically the same type of information, and can be generalized and unified.

One capability that was especially important for us to add was the ability for sysadmins to lock down configuration for the entire system in "lower-level" read-only files. As we merge the sources of configuration information, we provide a flag sort of like the css !important that lets the lower-level value be the final value.

I don't want to go into too much detail here. There's a blog post with more details, including how we deal with merging sequence and map-type configuration parameters. I did want to point all this out though as support for the usefulness of what @Screwtapello described.

Screwtapello commented 6 years ago

As we merge the sources of configuration information, we provide a flag sort of like the css !important that lets the lower-level value be the final value.

An alternative model that achieves the same goal is to have separate "config" and "override" files:

The advantage over an !important flag is that you don't need special syntax in your config-file format (and therefore serde, etc.) while the disadvantage is that you have nearly twice as many config locations to document, and for users to check when diagnosing surprising behaviour.

derekdreery commented 6 years ago

the disadvantage is that you have nearly twice as many config locations to document, and for users to check when diagnosing surprising behaviour

@Screwtapello you could mitigate this by being able to generate something like the following

# Configs - some config option
 1. There was no value at system-wide level. *value = default*
 2. Found value *newvalue* at user level *value = newvalue*
 3. There was no value at env/cli level *value = newvalue*
 4. There was no value at user override level *value = newvalue*
 5. Found value *newvalue2* at system override level *value = newvalue2*
 6. Final value for *config option* is *newvalue2*
soc commented 6 years ago

One option would be to have an API like

Config::from(system_overrides, commandline, environment, config_file, legacy_config_file, system)

Where people describe the order of settings they want to have and the library resolves settings in that order until a value is found.

I believe having some hard-coded, common-sense lookup scheme would be nice, but I fear that many applications would not fit well into it.

I think an additional bit that's important to get right is to track the origin of each setting, so that people don't end up with some_setting = "value", but some_setting = ("value", source) I think this would make it way more transparent to understand and debug where settings come from.

Screwtapello commented 6 years ago

I believe having some hard-coded, common-sense lookup scheme would be nice, but I fear that many applications would not fit well into it.

Do you have particular examples of applications that would not fit well? This lookup scheme is a superset of the lookup schemes of POSIX-tradition applications I can think of offhand, and resembles what I know of other platforms (macOS has system-wide /Library versus per-user ~/Library, Windows has HKEY_LOCAL_MACHINE\Software versus HKEY_CURRENT_USER\Software. Sure, not every existing piece of software follows exactly that scheme, but I imagine that's largely because they had to implement it themselves from scratch; if I could get that scheme for free from a library, I certainly would.

I think an additional bit that's important to get right is to track the origin of each setting, so that people don't end up with some_setting = "value", but some_setting = ("value", source)

I would love a library that provided a citation for every config setting (or every piece of input data in general), but as far as I know serde likes to throw away all that information once the deserialization is done. An alternative would be to make an opinionated library that supports only a single config file format, but tracks citations... which is doable, but would certainly be less popular.

soc commented 6 years ago

My concern is that every existing applications is probably littered with special cases, migration schemes and special rules how settings are handled. I'm not sure it is possible to encode all this into an API without making it extremely hard to use.

luser commented 6 years ago

There is a crate for "determing system configuration" (app-dirs-rs) but it seems unmaintained and not up to date

I've used app-dirs in a few projects and it was very close to what I want. It's unfortunate that it's unmaintained, but we should crib ideas from it for a replacement, if nothing else!

soc commented 6 years ago

@luser Is there anything in app-dirs that you feel is missing in directories or is done better?

spacekookie commented 6 years ago

@soc I think building a comprehensive API that handles config files in the (what's considered for that platform to be) "best practise" is the exact way you fix those corner cases though. A lot of people always assume they know it better or want to have configs somewhere else because…. Making it easier to follow the de facto standard way will reduce the amount of applications that don't do it that way, essentially getting rid of the problem :wink:

soc commented 6 years ago

@spacekookie I totally agree on making it easier to follow the standard!

My concern is that if you build an API that deals with all the corner cases you are pretty much making the API worse for the "good citizens" of the ecosystem, e. g. an API that supports

ok, here is where the config dir should be, except when we have a legacy dir over here, or some custom settings over there etc.

is very different from an API that says

give me the name of your app, and I'll tell you the paths where you should store things

killercup commented 6 years ago

I can totally imagine that the just-give-me-a-config crate that internally uses all the best practices and handle common stuff is just a bunch of foundational crates used in a specific way. So, if you wanted to customize it, you should be able to do what the JGMAC crate (need to work on an abbreviation) does (by doing a good amount of copy-pasta). Maybe we can even write the JGMAC crate in a style that makes "ejecting" easy.

This is an easier to maintain system than trying to do provide a super customizable API. You can think of it like a quicli for configs.

Am 19.03.2018 um 12:43 schrieb soc notifications@github.com:

@spacekookie I totally agree on making it easier to follow the standard!

My concern is that if you build an API that deals with all the corner cases you are pretty much making the API worse for the "good citizens" of the ecosystem, e. g. an API that says

ok, here is where the config dir should be, except when we have a legacy dir over here, or some custom settings over there etc.

is very different from an API that says

give me the name of your app, and I'll tell you the paths where you should store things

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

whmountains commented 6 years ago

IMO the Javascript CLI ecosystem really get's this right. See https://github.com/davidtheclark/cosmiconfig

EDIT: This is a good way to handle build-tool CLIs, but not so much for global CLI.

luser commented 6 years ago

@luser Is there anything in app-dirs that you feel is missing in directories or is done better?

directories seems very sensible and looks like it has roughly everything I would want. My only quibble would be that it does not ensure the directories exist, but that's not a deal-breaker. Getting directories to a 1.0 release would be a good thing for this working group to push on.

soc commented 6 years ago

My only quibble would be that it does not ensure the directories exist, but that's not a deal-breaker.

Yes, that was an explicit design decision, because the decision of what to do when something doesn't exist is extremely application-specific:

For instance, if you run an application for the first time, the directory doesn't exist, and you absolutely want the application to create it.

But if you have an application that uses a legacy .app folder instead of following XDG, you probably want to keep using that old folder as-is and not create the new directories to avoid breaking things.

luser commented 6 years ago

I've been using structopt in a few projects lately and I'd love to see a config file solution that works as nicely as it does. Maybe building on withoutboats' configure crate, something like:

#[derive(Deserialize, Configure, Default)]
struct Config {
 //...
}

fn work() -> Result<(), Error> {
  let config: Config = read_config()?;
}

Where read_config would use directories or something under the hood to locate an appropriate config file, and if not present use Default::default().

Some other things it'd be nice to consider:

i30817 commented 6 years ago

Just butting in with my uninformed opinion:

Not only in .cfg files but even in scripts and custom cli environments (dosbox CLI or dosbox.conf autoconfig section for example) programmers will try to get the user to insert Paths in user readable and writable format. As soon as users try to write this to file, this will never work portably without major sacrifices to correctness, which leads to 'oh you want a portable app with data pre-configured? You can do that, but you have to edit the .cfg/script to change the path separator'.

relative_from_portable_cfg(path: &str) -> PathBuff

For my preferred 'major sacrifice to correctness' I'd like a method to read a user made relative path that assumes a single and only path separator for all platforms (/ because it's a forbidden filename character in both windows and unix) and return None terminate with a Error if it has even the possibility of not being relative (start with /, \\ or [A-z]+: etc), so the user knew he fucked up.

I'm aware that unix relative paths can start with c: , but if you want to bring sanity to the 'user made portable paths' you can't accept everything possible.

Document that it's clearly for user made paths and leave it at that.

If you think this function isn't needed because you have a way to serialize portably and it's easy to roll your own; I'm very doubtful if most programmers will forbid the users from inserting their own paths on textual files and ofc, they tend to allow unrestrained paths when when they allow it, so a function that makes 'just what they want' but restricts them to relative paths might make them and the users think about the data portability case.

Or it maybe i'm too optimistic.

Dylan-DPC-zz commented 6 years ago

From the meeting:

@spacekookie :

Apart from that, I think most people are in agreement about creating a crate that "just gives out configs", abstracting a lot of the boilerplate away. But it's not clear yet how or how to then allow tweaking the defaults without having to reinvent the wheel Yea, directories-rs is just about the paths, pretty disconnected with any other workflow. A more comprehensive configuration crate would wrap around that then.

epage commented 6 years ago

@i30817 is your concern with reading/writing a config file on the same system or sharing config files across systems?

If sharing across systems, can you give more concrete details so we can flesh out what the requirements are for this?

i30817 commented 6 years ago

sharing across systems. Common usecase is placing user readable/writable cfg files which contain relative paths to some dir (of a portable app or app that can be made portable like firefox, or dolphin or retroarch) in a flash drive.

Supposedly users can use 'a/relative/path' syntax in windows. In practice, nearly all C and C++ programs/fileformat parsers blow up with this kind of relative path because they only ifdef _WIN32 '\' else '/' etc.

There are complications with forbidden characters that are legal on one platform but not in another (unix to windows mostly).

The vast majority of user made inputs to paths accepts both relative and absolute (when that is poison to moving systems). Which is kind of terrible, because the only way a user will get to have portable files is to use relative paths when possible but every cfg file and nearly all commands are determined to let the input be generalized to absolute paths.

It would be nice if the cfg format generalized a 'relative to the cfg dir' path function/standard requiring a user readable relative path with '/', and returns a (current platform) normalized absolute to the cfg (optionally with another absolute parent dir overload?).

In my mind this means early validation and early failure of corner cases and usecases that the user might have decided are 'necessary' (placing a absolute path, spaces without quotes, using a 'forbidden' character from OSes (besides '/' and closed quotes) on the 'relative path' etc).

I wouldn't actually mind if there is also another clearly marked function that parses 'exotic' relative or absolute paths for the current running platform without this harsh early failure though (ie things like c:\home/relativejoin or a/relative), it should help to prevent the ifdef idiocy above. But that's basically the stdlib job and a library for cfg should promote the portable case.

The only thing i'd regret is that this hack would also be useful for fileformats that insist on user readable paths like .cue for example, not only cfgs and might not be reinvented by several implementations like cue parsers in a attempt at 'generality'. But as long as PathBuf can deal with "a/relative with space" path on Windows, the very worst case is avoided for those formats because programmer laziness works for correctness here (unlike in C). But for cfg i believe you can do better, thus this request to make the 'default' path entries return a path relative to the cfg that is OS portable or fail hard (this could also be a constructor setting i guess).

Dylan-DPC-zz commented 6 years ago

We have opened a "RFC-lite" on confy with reference to this. Do leave your feedback.

sharkdp commented 6 years ago

The directories crate has been mentioned a few times in this ticket. I have recently started to use it in one of my command-line tools and now this issue came up: https://github.com/sharkdp/bat/issues/151. Do people here have any opinion on this? Should command-line tools on macOS use ~/Library/Preferences or ~/.config?

soc commented 6 years ago

@i30817 Interesting, I played with exactly this idea a few weeks ago:

i30817 commented 6 years ago

Yeah, well i reconsidered point 2 myself. I don't mind it being a warning during development, but i don't think it should be forced. Especially since most cli config paths are not controlled by devs but by the user and sometimes they themselves don't control what another tool/user named the files (imagine a file database that is using utf-8). Now i'm thinking that it should be a option to emit a warning about it, but not a 'kill the whole program' error.

soc commented 6 years ago

Yeah ... my approach comes with the expectation that developers are in control of their config file names. I think that's a reasonable assumption in 90% of the cases.

In general, I think there is no having the cake and eating it: either you rule out the various OS' filesystem craziness, or you allow it and have the complexity go through the roof.

I think there is value in having some crate that says "if you don't depend on the crazy parts of Windows path handling, this crate makes your life way easier".