lebrice / SimpleParsing

Simple, Elegant, Typed Argument Parsing with argparse
MIT License
401 stars 50 forks source link

json parsing of a boolean considers "false" to be True #107

Closed stevebyan closed 2 years ago

stevebyan commented 2 years ago

Describe the bug When parsing a json config file, SimpleParsing considers any string value for a boolean to be True. This is quite confusing to my users, who inadvertently write their json configs with quotes around the truth value, e.g. "light_switch": "false". They are surprised when the dataclass field light_switch ends up containing the value True.

To Reproduce

import enum
from typing import *
from pathlib import Path
from dataclasses import dataclass

from simple_parsing import ArgumentParser, field, choice
from simple_parsing.helpers import Serializable

@dataclass
class CfgFileConfig:
    """Config file args"""
    load_config: Optional[Path] = None          # load config file
    save_config: Optional[Path] = None          # save config to specified file

@dataclass
class MyPreferences(Serializable):
    """Booleans treat strings as true when loading json"""
    light_switch: bool = True # turn on light if true

def main(args=None):

    cfgfile_parser = ArgumentParser(add_help=False)
    cfgfile_parser.add_arguments(CfgFileConfig, dest="cfgfile")
    cfgfile_args, rest = cfgfile_parser.parse_known_args()

    cfgfile: CfgFileConfig = cfgfile_args.cfgfile

    file_config: Optional[Config] = None
    if cfgfile.load_config is not None:
        file_config = MyPreferences.load(cfgfile.load_config)

    parser = ArgumentParser()

    # add cfgfile args so they appear in the help message
    parser.add_arguments(CfgFileConfig, dest="cfgfile") 
    parser.add_arguments(MyPreferences, dest="my_preferences", default=file_config)

    args = parser.parse_args()

    prefs: MyPreferences = args.my_preferences
    print(prefs)

    if cfgfile.save_config is not None:
        prefs.save(cfgfile.save_config, indent=4)

if __name__ == '__main__':
    main()

Expected behavior I expect either a liberal parsing of the json file:

$ cat > bool-bug.json << EOF
{
    "light_switch": "false"
}
EOF
$ python issue.py --load_config bool-bug.json
MyPreferences(light_switch=False)

or an error:

$ python issue.py --load_config bool-bug.json
usage: issue.py [-h] [--load_config [Path]] [--save_config [Path]]
                   [--light_switch bool]
issue.py: error: argument --light_switch: Boolean value expected for argument, received string '"false"'

I expect an error when the json file contains some string value other than "true" or "false":

$ cat > bool-bug.json << EOF
{
    "light_switch": "blarg"
}
EOF
$ python issue.py --load_config bool-bug.json
usage: issue.py.py [-h] [--load_config [Path]] [--save_config [Path]]
                   [--light_switch bool]
issue.py: error: argument --light_switch: Boolean value expected for argument, received string '"blarg"'

Actual behavior SimpleParsing treats any string value for the json field to be True.

$ cat > bool-bug.json << EOF
{
    "light_switch": "false"
}
EOF
$ python issue.py --load_config bool-bug.json
MyPreferences(light_switch=True)
$ cat > bool-bug.json << EOF
{
    "light_switch": "blarg"
}
EOF
$ python issue.py --load_config bool-bug.json
MyPreferences(light_switch=True)

SimpleParsing could also treat non-string json values other than true or false more gracefully:

$ cat > bool-bug.json << EOF
{
    "light_switch": blarg
}
EOF
$ python issue.py --load_config bool-bug.json
Traceback (most recent call last):
  File "/Users/smb/Work/GIT/playground/issue.py", line 47, in <module>
    main()
  File "/Users/smb/Work/GIT/playground/issue.py", line 30, in main
    file_config = MyPreferences.load(cfgfile.load_config)
  File "/Users/smb/Work/GIT/playground/venv/lib/python3.9/site-packages/simple_parsing/helpers/serialization/serializable.py", line 237, in load
    return cls.load_json(
  File "/Users/smb/Work/GIT/playground/venv/lib/python3.9/site-packages/simple_parsing/helpers/serialization/serializable.py", line 304, in load_json
    return cls.load(
  File "/Users/smb/Work/GIT/playground/venv/lib/python3.9/site-packages/simple_parsing/helpers/serialization/serializable.py", line 269, in load
    return cls._load(
  File "/Users/smb/Work/GIT/playground/venv/lib/python3.9/site-packages/simple_parsing/helpers/serialization/serializable.py", line 282, in _load
    d = load_fn(fp, **kwargs)
  File "/usr/local/Cellar/python@3.9/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/usr/local/Cellar/python@3.9/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/local/Cellar/python@3.9/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/Cellar/python@3.9/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 21 (char 22)

Desktop (please complete the following information):

Additional context Note that due to shell quoting, --light_switch "false" as a command line argument gives the correct result, and --light_switch '"false"' gives the proper error, though the message would be more helpful if it indicated the value was a string rather than a boolean:

$ python issue.py --light_switch '"false"'
usage: issue.py [-h] [--load_config [Path]] [--save_config [Path]]
                   [--light_switch bool]
issue.py: error: argument --light_switch: Boolean value expected for argument, received '"false"'
lebrice commented 2 years ago

Hey there @stevebyan, thanks for posting!

I think in this case you'd want to overwrite/register a new custom decoding function for bool fields, and you'd probably want to use something like the str2bool utility that is used with args from the command line.

Can you take a look at the simple_parsing.helpers.serialization.decoding module? I think doing something like register_decoding_fn(bool, str2bool) might just do it. If that works well, we could also use that as the default too.

Thanks again, and let me know if that helps!

lebrice commented 2 years ago

Hey @stevebyan , sorry this took so long. This has been fixed now, it was a lot easier than I imagined in the end!