SYNOPSIS

Somehow facilitate supporting configuration files.

The WHY

Many programs have a need to be configurable (typically via configuration files). Even some of the quick and dirty scripts could benefit from that kind of thing, as long as it's darn easy...

While certain aspects of program configuration (such as the file names and formats) would arguably be best treated by a CLI framework, certain other aspects (such as the need to grok the specs of command line options) clearly have an interplay with 'GetOpt' functionality.

If the options parser (GLM, in this case) does not provide simple mechanisms for supporting this sort of thing, then many wheels need to be re-invented by CLI frameworks, or worse, command scripts themselves.

POINTERS

See :

DISCUSSION

Should GLM:

a) provide direct support for configuration files?
b) limit itself to do just facilitate the ordeal? -- how?

Obviously both approaches would have their merits and inconveniences.

TL;DR

For the time being, let's just keep this in mind but not implement anything until the refactoring is done.

We can decide later on whether we provide a (direct support); which, depending on how it's done, could still be desirable.

a) Direct support

Direct support for configuration could go as far as seamlessly retreiving and merging default values from program configuration files from predefined paths.

pros/cons

pros

p1) Easiest evolution pathway for script authoring/maintenance => Just add a couple settings in an existing script and off you go.
p2) It's as DRY as it gets

cons

c1) Risk of feature creep
c2) Risk of becoming too opiniated (OK for a CLI framework, not so much for a GetOpt provider)
c3) Risk of adding too many dependencies for GLM (e.g. at least something like Config::Any)
c4) Risk of worsening startup performance (depending on how this is done)

Granted, all but c1 above could be somewhat avoided/mitigated.

Details

The config handling (turned off by default) could happen during the call to a function in the GetOptions().

By default, config file paths could be obtained via DWIM and a couple sane conventions, and of course it should be possibile to override the DWIM.

A similar approach could be considered for file "formats" (INI, TOML, YAML, ...), but then GLM would find itself needing to be unnecessarily opininionated about what constituies a good config file format...

Something like [Config::Any] could eventually come the rescue... But even in that case, GLM would need to make sure that the script author is able to exert full control if and when he/she so desires.

Finally, beyond the technical aspects of a file format, something that is often overlooked or easily dismissed is what could be called the semantic format (in terms of vocabulary and information structure).

In case of CLI option processing, "vocabulary" should not be much of a concern (in the end, each option has a specified name and aliases, and GLM could have knowledge of those).

However, the question of "structure" can't easily be dismissed... A piece of information (such as a default value for a CLI option) could potentially at any depth or place in a config file.

What's worse is that quite often, those things may well fall out of any decision/control of the script athour (e.g. due to historical reasons, org policies, ....)

Therefore, if GLM ends up providing such direct support for configuration, it needs to be really careful to:

avoid turning it on by default, if done within GetOptions(), so as to remain GOL coformant
provide adequate settings for overriding its DWIM and/or conventions;
provide adequate facilities for the caller to easily do this sort of thing on their own;

Configuration file format

To avoid introducing another format into GLM, a config file can just be another set of command-line options and arguments. We can use Text::ParseWords's shellwords(), a core module which is already used by GoL. Suppose the configuration file contains:

--opt1 --no-opt3
--opt2 "value that has space"

shellwords() will pass that into:

('--opt1', '--no-opt3', '--opt2', 'value that has space')

GetOptionsFromArray would just unshift those elements into $array then proceed as normal. This will allow user to override configuration setting using command-line option (e.g. --opt2=no-this-val-instead).

Additional issue 1: users will eventually want to add comments. I think we can allow shell-style comment and strip that first before passing to shellwords().

Additional issue 2: CLIs that support config file often have a --no-config command-line option (e.g. wget) to skip searching and parsing for configuration. We will need a nice way to support this.

Environment variable support

Aside from configuration file, environment variable are also often used to configure a script. See cpanm (with PERL_CPANM_OPT) for example. This could simply be implemented by splitting the environment variable value using Text::ParseWords' shellwords(), then feed the split strings into the array using unshift. After the configuration file, but before the command-line options.

To integrate this into GLM, I vote the cmdspec() mechanism described in #18. Example:

GetOptions(
    cmdspec(
        config_files => ["$ENV{HOME}/foo.conf", "/etc/foo.conf"], 
        envs => ["FOO_OPT"],
        ...
    ),
    ...
);

And to avoid being opinionated, I vote that we do not heuristically search for some common config file locations or environment variable names, but just let users specify the paths and names.

Yupp! I had forgotten about this possibility... which is sometimes called an "args" (or "opts") file. It's not a full blown config solution, but we probably don't really need one at the GetOptions level.

So, it sounds pretty good to me. Just a few nitspicks, though. I would:

1) call this thing args-from-file, avoiding any relation to the name config;

2) also support args-from-string, i.e. an equivalent facility with strings (in addition to files), which would also: eliviate the need to explicitely support environment variables to be parsed (i.e. that may contain more than one option/arg), since that could be done with args-from-string; but if absolutely needed, we could evetually add support for args-from-envvar.

3) also support --args-reset, (or --reset-args), which would simply reset (discrad) the arguments parsed so far, up to a including itself.

4) later, we can also support dedicated environment variables for the individual options themselves

api-1) Just turn on some standard options, and tweak @ARGV


unshift @ARGV, [
--args-reset          # repeatable. mentioned just to show the possibility; otherwise not needed here.
--args-from-file     => "/etc/foo.args",
--args-from-file     => "$ENV{HOME}/.foo.args", 
--args-from-string => "$ENV{FOO_OPT}"
];

GetOptions( cmdspec(
        # enable  automatic support for some of the standard options
        standard_opts => { args_reset =>1, args_from_file => 1, args_from_string=> 1 }
        ...
));

api-2) Same thing, but the tweak is done by cmdspec()

GetOptions( cmdspec(
        # equivalent to the `unshift @ARGV` above
        prepend_args      => [ --args-from-file => [ "/etc/foo.args", ... ], --args-from-string=> ... ],
        standard_opts     => { args_reset =>1, args_from_file => 1, args_from_string=> 1 }
        ...
));

api-3) Alternate form (less powerful, but easier)

This form does NOT require enabling --args-from... for the end user But nothing forbids the script author from doing so...

api-3a) Just parse args from , without allowing the end user to do the same

GetOptions( cmdspec(
        prepend_args_from  => [ file => ..., file=> ..., string=> ... ],
        ...
));

api-3b) Same thing, but also allow end user to do the same

GetOptions( cmdspec(
        prepend_args_from  => [ file => ..., file=> ..., string=> ... ],
        standard_opts          => { args_reset =>1, args_from_file => 1, args_from_string=> 1 }
        ...
));

api-3c) And perhaps we are allowed to have opinions, too ?

Here, the script author specifically asks GLM to use some sane defaults...
This is as DRY as it gets...

use Getopts::Long::More qw(GetOptions :constants);

GetOptions( cmdspec(
        prepend_args_from  => DWIM,
        standard_opts           => DWIM,  # or ALL, ...
        ...
));

and in GLM, we would have:

package GetOpts::Long::More;
...
# exportable (or automatically exported) CONSTANTS
use constant ALL     => "!<=ALL=>!"     # or some other rare string 
use constant DWIM => "!<=DWIM=>!"  # or some other rare string 
...

General thoughts

I would vote for :

Allowing all of the above API choices... ultimately... but not necessarily all at once...
Having a single-char shortcut for --args-from-file, for example: -@
Allowing the usual automatic options (i.e. help and version) to also appear in standard_opts (in addition to and overriding the GoL style configuration, i.e. auto_help, etc)
having an independent mechanism for supporting dedicated environment variables for the individual options themselves (with another optional DRY/DWIM goodness for enabling this feature)

Also, OK for:

Using Text::ParseWords'shellwords() ==> it may still have some glitches, but I would agree that it's the best tool available.
Allowing shell-style comments -- but for starters, only for lines that match /^\s*#/, no ?

And here are some of the things I am NOT sure about:

The names of things mentioned above, i.e.
- --args-from-... OR --opts-from-...
- prepend_args ?
- standard_opts ? OR auto? OR auto_opts ? (similar to GoL) OR ?
The depth of prepend_args and standard_opts within the arguments to cmdspec() ...
- should they appear as shown above? i.e. :
  
  cmdspec( ..., standard_opts => {...}, ... )
- OR should they go into a level deeper:
  
  cmdspec( ..., configure => { standard_opts => {...}, ... } )
  
  If deeper, under which key? ( e.g. configure appears to be used for similar stuff by GL::Subcommand )
Would it be better to have :
- a) several specialized switches --args-from-file , --args-from-string, ...
- b) AND/OR just one standard switch: --args-from that does it all (with a syntax convention) This seems to be prettier/easier to use; but it could get pretty hairy trying to guess things; So, if we opt for this, it would be best to couple it with a syntax convention on the value.
One idea for such a convention could be something akin to a URI scheme:

  --args-from "::str::--foo bla --bar baz"  # string (UTF-8)
  --args-from "::env::FOO_ARGS"             # environment variable (UTF-8)
  --args-from "::file:///etc/foo.args"      # file
  --args-from "::bozo: bla-bla"             # =====> ERROR
  --args-from "::"                          # =====> ERROR
  --args-from "/etc/foo.args"               # file - (the default, unless matched by above)

  # (by default, file contents are also assumed be in UTF-8)

For the stringcase, the reason I have not opted for the data:, URI is because, by default it is assumed to be US-ASCII (as per the standard). Also, I don't quite like the comma , in there for some reason...

What do you think?

WORK DEPENDS ON

refactoring (#11)

If/when we allow --args-from-... or --args-reset on arbitrary places within @ARGV, we will need much more control...

Therefore, this would require the refactoring (#11) to have been completed;

And would almost certainly imply a systematic handling of all options with a wrapper CODE destination, as PERLANCAR had suggested.

Using "args-from" terminology instead of "config" sounds good to me.

--args-reset is surely more flexible than say --no-args-from-file and --no-args-from-env. But it does require a refactoring. And since I don't expect the "from" will be that many (file, env, what else?), for now I'm leaning on to individual --args-from-file, --args-from-env (instead of the generic --args-from) as well as an option to disable this from the command-line option e.g. --no-args-from-file and --no-args-from-env.

As for the way on enabling this feature, I'm preferring of the GoL style of auto_help and auto_version. We can add auto_args_from_file, auto_args_from_env. To customize this on a per-subcommand-level, we can use the 'configure' cmdspec() property, like in GL:Subcommand.

OK, "args-from" terminology it is, then.

As per the topis of --args-reset and the choice between --args-from (a single generic option) vs --args-from-XYZ (multiple specific options), I do see your point.

However, it's a bit more convoluted, I am afraid...

Yes, --args-reset would indeed require refactoring...

But so would --args-from, or any specifc derivitave thereof, in the case where we wish to properly support them as full-fledged options that are also capable of being passed by the end user (in addition to the script author).

The reason has to do with the orderig of options/arguments and the related set behaviours that have come to be widely expected by users in the Unix world.

options are read from left to right; and they are usually expected to override (if at all) each other in that particular direction.

In the example below, let's assume the --color is specified as taking a single value, as in color=s.

$ frobrinicate  --args-from-file "foo.args" --color=red --args-from-file "baz.args"

$ cat foo.args
--color=white

$ cat baz.args
--color=yellow

The question is:

If frobrinicate wants to abide by the widely expected conventional behavior (of left-to-right overridable options), what value should it use for --color at the end of the day?

If we expect that value to be yellow, then we need to realize that --args-from-file is quite similar (in terms of its challanges) to the case of --args-reset.

BTW, the shift in expectations may somewhat (but not entirely) be attributed to the shift in terminology: --config vs --args-from.

Anyhow, in both cases, the option handler needs to know its own position in @ARGV as well as being capable of performing surgery on it.

One tempting method to handle this kind of thing is to pre-process @ARGV. That method would have its merits (as well as inconvenineces)... But why not...

Yeah, even if we call it --config, as long as the value is in the command-line options itself, we currently will still need to do preprocessing of @ARGV to be able to add the contents of config at the beginning.

If we don't want to wait for refactoring, we'll need to specify args-from-* from cmdspec() ~~or Configure()~~ (Configure() only allows flags).

Yupp.. I am currently working on a draft proposal for the recfactoring. If it goes well, we may not have to wait too long :-)

perlancar / perl-Getopt-Long-More

More Stuff: Somehow facilitate supporting program Config #31