Closed agalileev closed 3 years ago
As far as I can tell, the --site
parameter would not make sense anymore, because you can't have multiple site-specific options or even site-specific sections in the config. That's quite significant feature loss. If you can't have site-specific sections, maybe it would be better to implement at least site-specific config files.
That's quite significant feature loss.
Is it? Eventually I have found the use case for --config
(e.g., when you want to test the new config without replacing the one on the standard config filepath), but --site
is looking quite useless.
[DEFAULT]
section including site-specific ones. This approach is straightforward. You do not need anymore "top_level_arg" manipulations in your code which results actually to nothing (no better usability, no better config management, etc.)The old --site
functionality cannot be restored (because configparser
does not support such config file syntax), so I am ready to implement the functionality of multiple config files management if you are sure that this is really necessary, but it looks like an overkill to me.
Do you use wiki-scripts on non-ArchWiki sites?
Yes - for example a wiki on localhost which I use for testing some stuff...
Hm, ok, I shall do it. And about testing - do you make any non-standard (I mean non-unittest
) testing?
Depends on what you consider non-standard. There are "standard" unit tests for some submodules using pytest, some more involved tests which actually require internet connection to ArchWiki (they are currently disabled and probably broken) and some very involved feature tests for the database synchronization part (they spawn a separate PostgreSQL process and a web server hosting a local MediaWiki instance) which never worked reliably and are also disabled. Besides that, most of the features were implemented by just running the scripts in interactive mode and observing the behaviour.
Hi. After hard painful thoughts I came up with an idea how to cross --config
, --site
and working with multiple config files. The two options (config and site) have been merged with the following implications:
~/.config/wiki-scripts
).$config_dir/<name>.conf
.--config
option: ./script.py -c <name>
.For example, you have the following config files in your config_dir:
localhost.conf
very-test-config.conf
archwiki1.conf
archwiki2.conf
Now you can choose a config for a script just with ./script.py -c very-test-config
, ./script.py -c archwiki1
, etc.
The benefits of such approach:
/long/configuration/file/path
in --config
option. The config names are much more convenient. If you type the name which is not valid the FileNotFoundError
will be raised (see ws.config.argtype_configfile()
function for details).ws.config
has been largely rewritten without any testing, so it potentially has the bugs. On this stage I just want your opinion about the concept.
Nice, I agree that merging --config
with --site
makes sense, but how do you determine the default <name>
? I don't think it should be anything related to Arch wiki, as that would be inconvenient for people who want to use wiki-scripts primarily with some other wiki. I think it could be just default
, so people would either use default.conf
only or they could make it a symlink to their primary config file. What do you think?
I think it would also be good to interpret the value for --config
two ways: first check if there is a <name>.conf
in the config directory (the way you're doing now), but if it fails, check if it is a path to an existing file. Using relative paths is pretty short and convenient with tab completion. This way it would be possible to e.g. test the example config from the repo with --config examples/sample.conf
without having to copy anything or doing something weird about the config path where <name>
is looked up.
Ok, I have implemented two suggested changes. Unfortunately, I did not manage to stop in time and made some other changes :). I decided to split functionality of argtype_*
between the type=
converters and action=
actions. It seems to me that it will be right if type=
will do only value conversion, and action=
will do further transformations and validity checks.
New types:
argtype_configfile
- converts a string to the configurations file path.argtype_path
- converts a string to the absolute filesystem path.New actions (they are registered and can be used with aliases shown below):
'check_path'
- check if the string is a path to a valid filesystem entry.'check_dirname'
- check if the "dirname" part of a path string is an existing directory.As for argtype_comma_list_choices
, I brutally dealt with it. :) It looks like a nasty hack, but this task can be done much simpler. The only precondition is to use the whitespace as the option values separator:
# in command-line
$ ./toc.py <some options> --toc-languages en ru fr <other options>
# in configuration file
[toc]
toc-languages = en ru fr
See toc.py
changes in cb26980 for more details.
It seems that I changed everything that could be changed in ws/config.py
. :) If you have any suggestions or feature requests - I am ready to deal with them. "Please, give me that task. I hope that is the task!" (c) BlackTeaHamburger. If no then I proceed to testing, debugging and writing docs.
Hi. I am debugging ws/config.py right now by running example scripts. And yes, I've already fixed the quotes. :) Some of your requests have already been resolved, but I will comment on every of them.
I move on to writing tests. Since I haven't worked with Pytest before, this may take a while. If you have new ideas on how I can make the world a better place, just let me know. :)
Hi. I have wrote a bunch of tests for ws.config
module. Two notes:
argtype_config()
was refactored. Config filename transformation code was moved to a separate private _get_config_filepath
function for better testing.getArgParser()
and object_from_argparser()
functions properly. They are too complex.Sorry, I won't be able to review your work this week. I'll try to get to it next week, otherwise feel free to ping me if nothing happens...
No problem, I have an interesting task right now that consumes all my time. Tell me when you will ready to resume.
@agalileev Sorry for the delay, I'll review your work today.
Hi. I have a problem with tests after 08e6df3f. All tests for _get_config_filepath()
are (expectedly) failed now. As I had told before, the only purpose of moving this piece of code to a separate private function was a better testing. We can test string-to-path-conversion functionality separately from file-existance-checking one. It is much more simpler than to test the relatively complex argtype_config()
function entirely.
Oh, I forgot about the tests - just fixed it. I'll try to add some tests also for the getArgParser
and object_from_argparser
functions. I also noticed some other small problems that need more investigation, otherwise it looks pretty good.
Hi, I've added some tests for the object_from_argparser
function. Unfortunately, 2 tests are failing - please see if you can fix it.
In 2e0bcbb I have fixed your changes in 08e6df3. I do now know why the absent default config must result in args.config=None
. If the file does not exist (either default or custom) the error must be raised. Instead I add a special case --config NONE
(NONE string is case insensitive) if you want to work without config file deliberately. Now the argtype_config()
workflow is:
-c NONE
was specified, then args.config=None
is set.-c <any_string>
was specified, then corresponding file is checked for existance.--config
option was specified, then default config file is checked.I mark one test as xfail
because it is checking non-existent functionality now. A few tests for argtype_config()
must be added, I shal do it tomorrow.
About object_from_argparser()
tests failing. I have found, that we must parse option only ones. Otherwise anyone option (either global from ws.config
or local from klass.set_argparser()
) with required=True
parameter may break the parsing. The problem is that we need to make initial parsing for --config
option, then find a config file and make parsing again. I found a not very pretty workaround to parse sys.argv
for --config
manually. I am not sure that it is the best possible choice, but it looks like argparse has no suitable functionality for this case. See 8a04e49 for details.
I also add allow_abbrev=False
to ArgumentParser
defaults.
The behaviour in https://github.com/lahwaacz/wiki-scripts/commit/08e6df3f83e9a795a06689af0fdf494f5f2cba8d is intentional, please don't change it. If somebody wants to use a very simple script, e.g. print-namespaces.py
, all that is needed is to supply a couple of options (e.g. --api-url
) and it does not make sense to force the user to specify --config NONE
or to create the default config file.
As for the other commit - it might be better to create a separate ArgumentParser
instance, define just --config
and use it for initial parsing. This should ensure that the initial parsing behaves like argparse
- now if you pass a wrong file to --config
, you'll get an exception traceback instead of a brief error message.
The behaviour in 08e6df3 is intentional
This is a mistake. Imagine a situation, when the user runs a script and wants to use the default config. He does not specify --config
option. But for some reason default.conf
is absent or is a broken link and the user do not know about it. Normally, the error should be raised, but... nothing is happend. args.config
just quietly set to None
and the script proceed! Actually, it can even work without fail, produce some (possibly mistaken) output and make some (possibly mistaken) edits on Wiki. And all this time the user has no any sign that something goes wrong! Now imagine how much effort it will take to find the cause of the error when it will revealed?
If you want to work without config file you must specify this explicitly. I have seen two possible implementations for this behaviour in another applications: either --no-config
option or a special case for standard --config
option (like --config NONE
).
it might be better to create a separate ArgumentParser instance
Yes, it can be a solution. I need to make some investigation.
No, it is not a mistake. It should work without forcing the user to create a default config file - that just doesn't make sense. If you want, you can raise the exception when default.conf
is a broken symlink. To explicitly skip an existing config, I'd like --no-config
more than --config NONE
.
I argee that creating a config as a precondition to a new user may look a bit weird, especially if he creates only one script that has no need in it. But your changes will result in undefined behavior if the --config
option is not specified. In fact, there are two possible behaviours for every script without --config
: to use default.conf
or do not use any config at all. To determine which of them will take place you must check manually your default.conf
every time you run any such script.
For example, you have three scripts to run in series:
./script1.py <options> # uses default.conf
./simple-script.py <options> # does not use config file
./script2.py <options> # uses default.conf again
You need default.conf
for script1
and script2
, but what will you do with it when ./simple-script
will be runned? Delete default.conf
before and create it after? Do you really want to do such kind of work? And what if one day you delete default.conf
for simple-script
and forgot to restore it before script2
is runned?
Actually, 08e6df3 kills the "default config" idea as it was planned initially. Are you afraid that user will have to type an additional --config NONE
option? Okay, but now user must type explicit --config archwiki
(for example) for every script that intend to use config file to guarantee right behaviour.
I'd like --no-config more than --config NONE
This can be implemented easily with parser.add_mutually_exclusive_group()
for --config
and --no-config
options, but we need to work out the final concept of --config
option first.
I think, we can use a more verbose error message as a possible solution, for example:
file does not exist or is a broken link: '<path_to_config>'. Create the config file or use '--no-config' option.
This a) explains the new user what he have to do; b) do not produce dangerous behaviour.
There is no undefined behaviour in the case I intend: if --config
is not specified, the behaviour depends on whether the default config exists or not, but the behaviour is well defined in either case.
Let me rephrase again. The intention is to have a shared config for all scripts, which supports sections for per-script options. So to specify options only for specific scripts, they need to be in a specific section in the config file - there is no danger of unintentionally changing the behaviour of other scripts. If you put them in the [DEFAULT]
section, I think the section name is good enough to indicate that such options affect all scripts. This is more or less the same behaviour as there is now, I don't want to change that.
the behaviour depends on whether the default config exists or not
But this is exactly a problem! Look:
$ ./script.py --cache-dir <some_dir> --api-url <some_url>
A question: "Does this script use defautl.conf
?"
An answer: "I do not know. May be yes, and may be not. It depends on whether the default config exists or not. May be there are another options in klass.set_argparser()
that must be read from default config, but if there is no default.conf
on filesystem they wil be set to default values, which are possibly mistaken for this run. We have no guarrantee that the script will behave as we intend because we do not now is there default.conf
on filesystem or not".
Another example:
$ ./script1.py --cache-dir <...> --api-url <...>
$ ./script2.py --cache-dir <...> --api-url <...>
A question: "Can you determine with this two strings (without checking the filesystem for a file), what script is working with default.conf
, and what is working without any config?"
An answer: "No, I cannot"
Do you see? This is what I mean "undefined behaviour". If we have two separate logics on "--config
is not specified" case then one command can do two different things depending on default.conf
presence. This is the source of big problems in the future. You will have to constantly check for the file if you have two types of scripts and want to guarantee no mistakes.
And now look for an alternative:
$ ./script1.py --no-config --cache-dir <...> --api-url <...>
$ ./script2.py --cache-dir <...> --api-url <...>
Here we can say with confidence:
--cache-dir
and --api-url
are the only two options in use.default.conf
file; --cache-dir
and --api-url
is taken from the command line, and if there are any options set in klass.set_argparser()
, they will be taken from the config file. (And if the default.conf
does not exist the error will be raised to warn the user).This is what I can call "well-defined behaviour" :)
If the user can't remember if their config file exists or not, they can't expect anything from the configuration system. If you want a nicer way to check which values apply for a given script invokation, you can try to reimplement ConfigArgParse's print_values function and bind it to an option such as --print-values
. But I don't find this feature necessary.
As for "constantly checking for the file": of course the user will never do this, they will use the defaults first (which will either succeed or fail), then they might pass a cli option or create a config file for the script to change the default. But this is not a problem, because if you configure the script in the proper config file section, it will not affect other scripts in any way. Again, it is the user's responsibility to write the config in such a way that they don't get surprised in the future.
Please remember that this PR is about replacing third-party modules, not about major reimplementation of the configuration system.
Okaaay, I roll changes back to 08e6df3. Just for the record — this is a mistake :(
I need more time to fix "args.config parsing", but for now all tests are passing.
In our discussion it sounded indistinct, so I will clarify — do we need in --no-config
option implemented or this will be a major reimplementation of the configuration system too?
In our discussion it sounded indistinct, so I will clarify — do we need in
--no-config
option implemented or this will be a major reimplementation of the configuration system too?
I don't consider this to be major, so I'll leave it up to you.
Hi. I have tried some ideas how to handle args.config parsing without an error. The core problem was that if we use two fully separate parsers than the options only from the first of them gets in --help
output — but if we use the single parser, any option with required=True
can break the parsing.
I have found that the problem can be solved if to inject config parser into the main parser with the parents=
parameter. I have no idea how (and why) this works, but it works... The tradeoff is that getArgParser()
has the interface changed. It returns a tuple of two now. The first element is the main parser, and the second is a new "parent" parser with the only --config
/--no-config
options group.
If the script uses only object_from_argparser()
it will work without changes. If the script uses getArgParser()
it must be changed. The string
argparser = ws.config.getArgParser()
must be replaced with
argparser, _ = ws.config.getArgParser()
I have changed examples/parse-config.py
script for example.
May be I have something missed, but all tests are passed.
Hi, good job. Actually we don't need to use parents=
, only add_help=False
for the auxiliary parser with the config arguments.
I also moved some things around and split the config file parsing from object_from_argparser
into a separate function parse_args
, because I realized that there are scripts which use plain getArgParser
without object_from_argparser
and the config file would not be parsed by those scripts.
There is still (at least) one behavioral difference, though: config files should be allowed to contain unrecognized arguments/variables, but unrecognized arguments specified on the command line should be reported as errors. I've added two test cases for this, one of them fails.
I have added the check for unrecognized options in cli_args. But there is one case which is not covered by this logic: when the multiple short options are joined together with a single -
prefix. argparse
manages such cases well, but checking these options manually will result in a big piece of code.
Moreover, such options can produse a separate error message. Consider parser with only defined -a
option which is not taken explicit value from the command line (e.g., action="store_true"
). Parsing args, remainder = ap.parse_known_args(["-abc"])
produces an error:
error: argument -a: ignored explicit argument 'bc'
It does not perceive b
and c
as options at all and tries to treat bc
as an explicit value for -a
option.
Theoretically, the problem can be solved with two-step parsing. We can parse config_args
with the parse_known_args()
which do not check for unrecognized options, and the parse cli_args
with parse_args()
(which do). It will be "a right way". But here we will face with the old problem: any required=True
option with high probability breaks parsing and the whole script.
I am not be able to investigate this problem until next Tuesday because I have some jobs I must to do (including monthly syncing of AW articles). After that I will be ready to fight this PR until death, its or mine :).
But there is one case which is not covered by this logic: when the multiple short options are joined together with a single - prefix. argparse manages such cases well, but checking these options manually will result in a big piece of code.
I think it can be handled quite easily, since short options are not allowed in config files, so any short option remaining after the parsing means an error. It is just a matter of figuring out the right error message (and checking that short options in the config file are actually not possible).
I am not be able to investigate this problem until next Tuesday
Take your time, there is no rush - I'm quite busy these days too...
Hi. It looks like I understand how to work with multioptions. Consider such case:
_, remainder = ap.parse_known_args(["-abc"], namespace=args)
argparse
sees multioption block, he tries to recognize the separate options it consists. If the first of them (-a
) was defined earlier argparse
makes parsing as usually, with assignments or raising errors. We cannot do something here because this is argparse
's internal logic.-a
was not defined argparse
places the whole block to remainder
as unrecognized. Here we can catch it with if item.startswith("-") and item in cli_args
statement and make some error handling.I have also added the check for short options in a config file and some tests.
Hi, Jakub. Do you have any update for this pull request? I will have a lot of free time this month, so if you have any ideas or suggestions I can work on them.
Hi, sorry I did not find time for this in the last month. I'll try to look at it at the weekend.
Ok, look. The main purpose that I intended with this PR was to get practice in Python programming on real project, and more or less, the goal has been achieved. If you are not very enthusiastic at the thought of merging this PR (for whatever reason), then there is no need to continue. We can simply close it to save our time for other jobs. :)
No, just closing the PR would be a waste of our already spent effort. It's not about enthusiasm, it is a good idea and I actually like your changes and the general direction this PR took, but I simply do not have as much free time these days as I had in the summer.
I think that the code is mostly finished, but before merging we should write some brief documentation and/or a migration guide for other users (they are not many but we should still be nice to them). Or at least make a "beta" release or ask them for testing it or something like that. What I mean to say here is that the last step is more than just clicking the merge button.
Ok, then I shall wait. Tell me if I can do something to help.
This took much longer than I'd like to admit, but it is finally merged! I apologize for letting you wait so long...
I've released version 1.3 just before the merge, so we can polish the details and documentation for the next version (which will be probably 2.0).
Oh my!.. This day has finally come. It looks like I shall drink vodka for a month or two...
Hi. I made some changes for #66.
ConfigArgParse
was replaced withargparse
, andconfigfile
- withconfigparser
.After some thoughts, I decided to limit the changes and do not rework the class / function interfaces where possible. I don't know how to make tests on changes properly. Are you using unittests or something else? How do you make the "production" testing?
And yes, the syntax of the config file had to be changed to become compatible with the
configparser
module. An example can be seen in the fileexamples/sample.conf
.