lebrice / SimpleParsing

Simple, Elegant, Typed Argument Parsing with argparse
MIT License
410 stars 52 forks source link

The library should handle unknown arguments coming from a configuration file #246

Closed oyarsa closed 1 year ago

oyarsa commented 1 year ago

Is your feature request related to a problem? Please describe. When using a configuration file and parse_known_args, I get an error saying the dataclass constructor does not contain a given parameter. My expectation is that parse_known_args would handle these cases. For example, if I have the following code and configuration file:

# config.py
from dataclasses import dataclass
import simple_parsing

@dataclass
class Config:
    param1: int

args = simple_parsing.parse_known_args(Config, add_config_path_arg=True)
# config.yaml
param1: 10
param2: 20

The program fails with this error:

❯ python config.py --config_path config.yaml
Traceback (most recent call last):
  File "~/config.py", line 10, in <module>
    args = simple_parsing.parse_known_args(Config, add_config_path_arg=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.venv/lib/python3.11/site-packages/simple_parsing/parsing.py", line 1058, in parse_known_args
    parsed_args, unknown_args = parser.parse_known_args(args, attempt_to_reorder=attempt_to_reorder)
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.venv/lib/python3.11/site-packages/simple_parsing/parsing.py", line 343, in parse_known_args
    parsed_args = self._postprocessing(parsed_args)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.venv/lib/python3.11/site-packages/simple_parsing/parsing.py", line 575, in _postprocessing
    parsed_args = self._instantiate_dataclasses(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.venv/lib/python3.11/site-packages/simple_parsing/parsing.py", line 843, in _instantiate_datacla
sses
    value_for_dataclass_field = _create_dataclass_instance(
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/.venv/lib/python3.11/site-packages/simple_parsing/parsing.py", line 1131, in _create_dataclass_i
nstance
    return constructor(**constructor_args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: Config.__init__() got an unexpected keyword argument 'param2'

Describe the solution you'd like Unexpected arguments coming from a configuration file should be ignored.

Describe alternatives you've considered I'm currently using the following decorator:

def filter_kwargs(cls: type) -> type:
    if not is_dataclass(cls):
        raise TypeError("filter_kwargs should only be used with dataclasses")

    original_init = cls.__init__  # type: ignore

    def new_init(cls: type, **kwargs: dict[str, Any]) -> None:
        filtered_kwargs = {
            f.name: kwargs[f.name] for f in fields(cls) if f.name in kwargs
        }
        return original_init(cls, **filtered_kwargs)

    cls.__init__ = new_init  # type: ignore
    return cls

However, I believe this should be handled by the library.

lebrice commented 1 year ago

Hello there @oyarsa , thanks for posting!

I'd like to understand your use-case better. What's the context where you'd want to allow extra unused values in the configuration file?

oyarsa commented 1 year ago

My use case is to re-use configuration files between similar-but-not-same scripts. Specifically, I'm working on machine learning scripts, and they tend to have a lot of parameters in common, but some are specific to one program or the other.

For example, I might have separate training and inference scripts. The inference script will share some settings, but not all, with the training script. It's useful to be able to share the same configuration between them.

lebrice commented 1 year ago

Ok thanks for clarifying. I'll think about this a little bit, and get back to you.

oyarsa commented 1 year ago

Once you have a good design for this, I can implement it. My decorator works, but doesn't type well.

lebrice commented 1 year ago

Oh btw, you can do this:

from typing import TypeVar
T = TypeVar("T")
def filter_kwargs(cls: type[T]) -> type[T]:
    if not is_dataclass(cls):
        raise TypeError("filter_kwargs should only be used with dataclasses")

    original_init = cls.__init__  # type: ignore

    def new_init(cls: type, **kwargs: dict[str, Any]) -> None:
        filtered_kwargs = {
            f.name: kwargs[f.name] for f in fields(cls) if f.name in kwargs
        }
        return original_init(cls, **filtered_kwargs)

    cls.__init__ = new_init  # type: ignore
    return cls

Perhaps this will help with the typing a little bit?

oyarsa commented 1 year ago

That works better, but having the type: ignore for accessing and setting cls.__init__ is awkward. There must be a better way.

Anyway, this is good enough for now.