Open tabulon opened 4 years ago
Configuration file format
To avoid introducing another format into GLM, a config file can just be another set of command-line options and arguments. We can use Text::ParseWords's shellwords()
, a core module which is already used by GoL. Suppose the configuration file contains:
--opt1 --no-opt3
--opt2 "value that has space"
shellwords()
will pass that into:
('--opt1', '--no-opt3', '--opt2', 'value that has space')
GetOptionsFromArray would just unshift those elements into $array then proceed as normal. This will allow user to override configuration setting using command-line option (e.g. --opt2=no-this-val-instead
).
Additional issue 1: users will eventually want to add comments. I think we can allow shell-style comment and strip that first before passing to shellwords()
.
Additional issue 2: CLIs that support config file often have a --no-config
command-line option (e.g. wget
) to skip searching and parsing for configuration. We will need a nice way to support this.
Environment variable support
Aside from configuration file, environment variable are also often used to configure a script. See cpanm
(with PERL_CPANM_OPT
) for example. This could simply be implemented by splitting the environment variable value using Text::ParseWords' shellwords(), then feed the split strings into the array using unshift. After the configuration file, but before the command-line options.
To integrate this into GLM, I vote the cmdspec() mechanism described in #18. Example:
GetOptions(
cmdspec(
config_files => ["$ENV{HOME}/foo.conf", "/etc/foo.conf"],
envs => ["FOO_OPT"],
...
),
...
);
And to avoid being opinionated, I vote that we do not heuristically search for some common config file locations or environment variable names, but just let users specify the paths and names.
Yupp! I had forgotten about this possibility... which is sometimes called an "args" (or "opts") file. It's not a full blown config solution, but we probably don't really need one at the GetOptions level.
So, it sounds pretty good to me. Just a few nitspicks, though. I would:
1) call this thing args-from-file
, avoiding any relation to the name config
;
2) also support args-from-string
, i.e. an equivalent facility with strings (in addition to files), which would also:
eliviate the need to explicitely support environment variables to be parsed (i.e. that may contain more than one option/arg), since that could be done with args-from-string; but if absolutely needed, we could evetually add support for args-from-envvar.
3) also support --args-reset
, (or --reset-args
), which would
simply reset (discrad) the arguments parsed so far, up to a including itself.
4) later, we can also support dedicated environment variables for the individual options themselves
unshift @ARGV, [
--args-reset # repeatable. mentioned just to show the possibility; otherwise not needed here.
--args-from-file => "/etc/foo.args",
--args-from-file => "$ENV{HOME}/.foo.args",
--args-from-string => "$ENV{FOO_OPT}"
];
GetOptions( cmdspec(
# enable automatic support for some of the standard options
standard_opts => { args_reset =>1, args_from_file => 1, args_from_string=> 1 }
...
));
GetOptions( cmdspec(
# equivalent to the `unshift @ARGV` above
prepend_args => [ --args-from-file => [ "/etc/foo.args", ... ], --args-from-string=> ... ],
standard_opts => { args_reset =>1, args_from_file => 1, args_from_string=> 1 }
...
));
This form does NOT require enabling --args-from...
for the end user
But nothing forbids the script author from doing so...
GetOptions( cmdspec(
prepend_args_from => [ file => ..., file=> ..., string=> ... ],
...
));
GetOptions( cmdspec(
prepend_args_from => [ file => ..., file=> ..., string=> ... ],
standard_opts => { args_reset =>1, args_from_file => 1, args_from_string=> 1 }
...
));
Here, the script author specifically asks GLM to use some sane defaults...
This is as DRY as it gets...
use Getopts::Long::More qw(GetOptions :constants);
GetOptions( cmdspec(
prepend_args_from => DWIM,
standard_opts => DWIM, # or ALL, ...
...
));
and in GLM, we would have:
package GetOpts::Long::More;
...
# exportable (or automatically exported) CONSTANTS
use constant ALL => "!<=ALL=>!" # or some other rare string
use constant DWIM => "!<=DWIM=>!" # or some other rare string
...
I would vote for :
--args-from-file
, for example: -@
help
and version
) to also appear in standard_opts
(in addition to and overriding the GoL style configuration, i.e. auto_help
, etc)Also, OK for:
Text::ParseWords'shellwords()
==> it may still have some glitches, but I would agree that it's the best tool available./^\s*#/
, no ?And here are some of the things I am NOT sure about:
The names of things mentioned above, i.e.
--args-from-...
OR --opts-from-...
prepend_args
? standard_opts
? OR auto
? OR auto_opts
? (similar to GoL) OR ?The depth of prepend_args
and standard_opts
within the arguments to cmdspec() ...
should they appear as shown above? i.e. :
cmdspec( ..., standard_opts => {...}, ... )
OR should they go into a level deeper:
cmdspec( ..., configure => { standard_opts => {...}, ... } )
If deeper, under which key? ( e.g. configure
appears to be used for similar stuff by GL::Subcommand
)
Would it be better to have :
--args-from-file
, --args-from-string
, ...--args-from
that does it all (with a syntax convention)
This seems to be prettier/easier to use; but it could get pretty hairy trying to guess things;
So, if we opt for this, it would be best to couple it with a syntax convention on the value.One idea for such a convention could be something akin to a URI scheme:
--args-from "::str::--foo bla --bar baz" # string (UTF-8)
--args-from "::env::FOO_ARGS" # environment variable (UTF-8)
--args-from "::file:///etc/foo.args" # file
--args-from "::bozo: bla-bla" # =====> ERROR
--args-from "::" # =====> ERROR
--args-from "/etc/foo.args" # file - (the default, unless matched by above)
# (by default, file contents are also assumed be in UTF-8)
For the string
case, the reason I have not opted for the data:,
URI is because, by default it is assumed to be US-ASCII
(as per the standard). Also, I don't quite like the comma ,
in there for some reason...
What do you think?
If/when we allow --args-from-...
or --args-reset
on arbitrary places within @ARGV
, we will need much more control...
Therefore, this would require the refactoring (#11) to have been completed;
And would almost certainly imply a systematic handling of all options with a wrapper CODE destination, as PERLANCAR had suggested.
Using "args-from" terminology instead of "config" sounds good to me.
--args-reset
is surely more flexible than say --no-args-from-file
and --no-args-from-env
. But it does require a refactoring. And since I don't expect the "from" will be that many (file, env, what else?), for now I'm leaning on to individual --args-from-file
, --args-from-env
(instead of the generic --args-from
) as well as an option to disable this from the command-line option e.g. --no-args-from-file
and --no-args-from-env
.
As for the way on enabling this feature, I'm preferring of the GoL style of auto_help and auto_version. We can add auto_args_from_file, auto_args_from_env. To customize this on a per-subcommand-level, we can use the 'configure' cmdspec() property, like in GL:Subcommand.
OK, "args-from" terminology it is, then.
As per the topis of --args-reset
and the choice between --args-from
(a single generic option) vs --args-from-XYZ (multiple specific options), I do see your point.
However, it's a bit more convoluted, I am afraid...
Yes, --args-reset
would indeed require refactoring...
But so would --args-from
, or any specifc derivitave thereof, in the case where we wish to properly support them as full-fledged options that are also capable of being passed by the end user (in addition to the script author).
The reason has to do with the orderig of options/arguments and the related set behaviours that have come to be widely expected by users in the Unix world.
In the example below, let's assume the --color
is specified as taking a single value, as in color=s
.
$ frobrinicate --args-from-file "foo.args" --color=red --args-from-file "baz.args"
$ cat foo.args
--color=white
$ cat baz.args
--color=yellow
The question is:
If frobrinicate
wants to abide by the widely expected conventional behavior (of left-to-right overridable options), what value should it use for --color
at the end of the day?
If we expect that value to be yellow
, then we need to realize that --args-from-file
is quite similar (in terms of its challanges) to the case of --args-reset
.
BTW, the shift in expectations may somewhat (but not entirely) be attributed to the shift in terminology: --config
vs --args-from
.
Anyhow, in both cases, the option handler needs to know its own position in @ARGV
as well as being capable of performing surgery on it.
One tempting method to handle this kind of thing is to pre-process @ARGV. That method would have its merits (as well as inconvenineces)... But why not...
Yeah, even if we call it --config, as long as the value is in the command-line options itself, we currently will still need to do preprocessing of @ARGV to be able to add the contents of config at the beginning.
If we don't want to wait for refactoring, we'll need to specify args-from-* from cmdspec() or Configure() (Configure() only allows flags).
Yupp.. I am currently working on a draft proposal for the recfactoring. If it goes well, we may not have to wait too long :-)
SYNOPSIS
Somehow facilitate supporting
configuration
files.The WHY
Many programs have a need to be configurable (typically via configuration files). Even some of the quick and dirty scripts could benefit from that kind of thing, as long as it's darn easy...
While certain aspects of program configuration (such as the file names and formats) would arguably be best treated by a CLI framework, certain other aspects (such as the need to grok the specs of command line options) clearly have an interplay with 'GetOpt' functionality.
If the options parser (GLM, in this case) does not provide simple mechanisms for supporting this sort of thing, then many wheels need to be re-invented by CLI frameworks, or worse, command scripts themselves.
POINTERS
See :
DISCUSSION
Should GLM:
Obviously both approaches would have their merits and inconveniences.
TL;DR
For the time being, let's just keep this in mind but not implement anything until the refactoring is done.
We can decide later on whether we provide
a
(direct support); which, depending on how it's done, could still be desirable.a) Direct support
Direct support for configuration could go as far as seamlessly retreiving and merging default values from program configuration files from predefined paths.
pros/cons
pros
cons
Granted, all but
c1
above could be somewhat avoided/mitigated.Details
The config handling (turned off by default) could happen during the call to a function in the GetOptions().
By default, config file paths could be obtained via DWIM and a couple sane conventions, and of course it should be possibile to override the DWIM.
A similar approach could be considered for file "formats" (INI, TOML, YAML, ...), but then GLM would find itself needing to be unnecessarily opininionated about what constituies a good config file format...
Something like [Config::Any] could eventually come the rescue... But even in that case, GLM would need to make sure that the script author is able to exert full control if and when he/she so desires.
Finally, beyond the technical aspects of a file format, something that is often overlooked or easily dismissed is what could be called the semantic format (in terms of vocabulary and information structure).
In case of CLI option processing, "vocabulary" should not be much of a concern (in the end, each option has a specified name and aliases, and GLM could have knowledge of those).
However, the question of "structure" can't easily be dismissed... A piece of information (such as a default value for a CLI option) could potentially at any depth or place in a config file.
What's worse is that quite often, those things may well fall out of any decision/control of the script athour (e.g. due to historical reasons, org policies, ....)
Therefore, if GLM ends up providing such direct support for configuration, it needs to be really careful to:
caller
to easily do this sort of thing on their own;