lahwaacz / wiki-scripts

Framework for writing bots, maintenance scripts or performing data analysis on wikis powered by MediaWiki
http://lahwaacz.github.io/wiki-scripts/
GNU General Public License v3.0
27 stars 12 forks source link

ws.config: new argument parser without third-party modules #66

Closed agalileev closed 3 years ago

agalileev commented 4 years ago

Last days I have investigated the ws.config module and I came to the conclusion that it is unreasonably complicated and bloated. Moreover, two third-party modules, ConfigArgParse (which is used as the git submodule) and configfile, can be easily replaced with standard library modules argparse and confgparser.

I have written a simple prototype for the new argument parser (see https://github.com/agalileev/wiki-scripts/commit/16fc4da90c945f1e69b158451c53586d9efa1d3d). The script named parse_conf.py is in the example directory, and the config file config.ini is in the same place. The content of the config file is not related to wiki-scripts. It has the standard .ini files markup. The workflow of the script is as follows:

  1. Parse the [DEFAULT] config file section with the configparser parser. Here will be stored the project defaults (project_name, api_url, index_url, etc).
  2. Parse the script-related config file section. This section must have the common [<script>] name, but in this example for demonstration purposes the [topsecret] section was used. Parameters from the script-related section overrides the [DEFAULT] ones.
  3. Convert parsed data to dict object.
  4. Parse command-line arguments with argparse and convert it to dict too.
  5. Merge two dicts (cli argumets override the parameters from a file).
  6. Wrap the resulting dict into the argparse.Namespace object to access the arguments in dot notation.
  7. ...
  8. PROFIT! We have a Namespace object with both file and cli arguments. It can be passed to (for example) object_from_argparser() function.

All of the above will allow massively clean up the ws.config module:

  1. ConfigFileParser class is not needed anymore. I have found only one instantiation call of it through the project. The class can be deleted after fixing the call.
  2. Defaults class is internal for the ws.config module and can be deleted, because the defaults can be stored in the [DEFAULT] config file section. This approach is much better than hardcoding the values in the program.
  3. getArgParser() function is the hard place because it is called in a lot of places. But its main purpose is to merge two parser objects, file and cli, and this functionality is not needed anymore. In addition, it uses the ConfigArgParse module we want to get rid of.
  4. object_from_argparser() function will be updated.

I am ready to implement these changes in a few weeks. :-)

lahwaacz commented 4 years ago

Note that configfile supports subsections and inheritance of options from parent sections. While you could probably hack this with the standard module too, it is much easier to use an existing module - especially handling interpolation and inheritance correctly is nontrivial.

As for ConfigArgParse, I agree with your assessment - it is unreasonably complicated and bloated. There is an oldish bundled version with some custom modifications (I don't know exactly what they do, it has probably something to do with the ConfigFileParser class to interface it with configfile...)

If you can simplify this so that we can remove ConfigArgParse, that would be great. But note that your script is too simple as it has a hardcoded config file path. In reality you need to (partially) parse the command line first to get the path from --config. Also, the result should not be a simple merge, but a filter should be applied on the config file so that only options defined in argparse are read from the config file.

agalileev commented 4 years ago

configparser supports subsections, inheritance and interpolation too, these two modules are quite similar. But configparser is a standard module, while configfile is third-party and is used in ws.config only. Replacement allows us to delete one dependency.

As I understand, there is only one ConfigArgParse feature used in ws.config: injection of ConfigFileParser object into ArgumentParser object to parse them together with parse_args() function. If this functionality will be replaced as we plan, ConfigArgParse will no longer be needed.

And I have two comments on --config.

  1. To parse the command-line arguments before the config file parameters, you need only to swap two blocks of code. :-)
  2. Do we really need such flexibility? An ability to choose an arbitrary location for the config file seems redundant. The common practice in the great number of projects is to hardcode a bunch of 2 or 3 standard places for config. Vim stores configs in /etc/vimrc, ~/.vimrc and ~/.vim/ for example. I think that the better way is to store wiki-scripts config only in project root (if it will be the only one file) or in project root subdirectory (in the case of a set of files).
lahwaacz commented 4 years ago

As far as I know, configparser supports inheritance only from the [DEFAULT] section, but it does not support arbitrary subsections like configfile does.

And I just realized an important thing that ConfigArgParse does -- if you consider command-line options defined like this:

p.add_argument("--log-level", action="store",
            choices=LOG_LEVELS.keys(), default="info",
            help="the verbosity level for terminal logging")
p.add_argument("-d", "--debug", action="store_const",
            const="debug", dest="log_level",
            help="shorthand for '--log-level debug'")
p.add_argument("-q", "--quiet", action="store_const",
            const="warning", dest="log_level",
            help="shorthand for '--log-level warning'")

Then just merging two dicts will not work, because configfile or configparser alone does not know how to translate debug = true or quiet = true into log-level = debug or log-level = warning. It would also not check if the value of log-level is valid.

Anyway, my config usually looks like this. If you don't break it, I will consider any changes :wink: