omni-us / jsonargparse

Implement minimal boilerplate CLIs derived from type hints and parse from command line, config files and environment variables
https://jsonargparse.readthedocs.io
MIT License
314 stars 42 forks source link

print_config not work correctly when union dataclass-like class #495

Closed pjgao closed 2 months ago

pjgao commented 4 months ago

🐛 Bug report

To reproduce

I have these classes,both A and B are inherited from BaseC is inherited from pydantic.BaseModel (dataclass-like class):

from typing import Union
from abc import ABC

from jsonargparse import CLI
from pydantic import BaseModel

class Base(ABC):
    def __init__(self,base='Base') -> None:
        self.base = base

class A(Base):
    def __init__(self, a: str = "a",**kwargs):
        self.a = a
        super().__init__(**kwargs)

class B(Base):
    def __init__(self, b: str = "b",**kwargs):
        self.b = b
        super().__init__(**kwargs)

class C(BaseModel):
    c:str = "c"
  1. when type hint is params:Base

     def func(params:Base) -> None:
         print('type is:', type(params))
         print('params is:', params)
    
     CLI(func, as_positional=False)

    we get:

     (base) linux@DESKTOP:/mnt/d/projects$ python test.py -h
     usage: test.py [-h] [--config CONFIG] [--print_config[=flags]] [--params.help CLASS_PATH_OR_NAME] --params PARAMS
    
     <function func at 0x7f1a06a92160>
    
     optional arguments:
       -h, --help            Show this help message and exit.
       --config CONFIG       Path to a configuration file.
       --print_config[=flags]
                             Print the configuration after applying all other arguments and exit. The optional flags customizes the output and are one or more keywords separated by comma. The supported flags are: comments, skip_default, skip_null.
       --params.help CLASS_PATH_OR_NAME
                             Show the help for the given subclass of Base and exit.
       --params PARAMS       (required, type: <class 'Base'>, known subclasses: __main__.Base, __main__.A, __main__.B)
    
     (base) linux@DESKTOP:/mnt/d/projects$ python test.py --print_config 
     params: null
     (base) linux@DESKTOP:/mnt/d/projects$ python test.py --print_config --params Base
     params:
       class_path: __main__.Base
       init_args:
         base: Base
     (base) linux@DESKTOP:/mnt/d/projects$ python test.py --print_config --params A
     params:
       class_path: __main__.A
       init_args:
         a: a
         base: Base
     (base) linux@DESKTOP:/mnt/d/projects$ python test.py --print_config --params B
     params:
       class_path: __main__.B
       init_args:
         b: b
         base: Base
     (base) linux@DESKTOP:/mnt/d/projects$ python test.py --print_config --params C
     usage: test.py [-h] [--config CONFIG] [--print_config[=flags]] [--params.help CLASS_PATH_OR_NAME] --params PARAMS
     error: Parser key "params":
    
       - Expected a config path but C either not accessible or invalid
       - Expected a dot import path string: C

print_config gives config example rightly when we feed —params right class (A/B/Base)

  1. when type hint is params:C (dataclass-like class)

     def func(params:C) -> None:
         print('type is:', type(params))
         print('params is:', params)
    
     CLI(func, as_positional=False)

    we get:

     (base) linux@DESKTOP:/mnt/d/projects$ python test.py --print_config 
     params:
       c: c
     (base) linux@DESKTOP:/mnt/d/projects$ python test.py --print_config --params C
     usage: test.py [-h] [--config CONFIG] [--print_config[=flags]] [--params CONFIG] [--params.c C]
     error: Parser key "params":
     Unable to load config 'C'
       - Parser key "params": Unable to load config "C"
     (base) linux@DESKTOP:/mnt/d/projects$ python test.py --print_config --params A
     usage: test.py [-h] [--config CONFIG] [--print_config[=flags]] [--params CONFIG] [--params.c C]
     error: Parser key "params":
     Unable to load config 'A'
       - Parser key "params": Unable to load config "A"
     (base) linux@DESKTOP:/mnt/d/projects$ cat a.yaml
     params:
         c: c
     (base) linux@DESKTOP:/mnt/d/projects$ python test.py --config a.yaml
     type is: <class '__main__.C'>
     params is: c='c'

    In contrast to subclasses, which requires the user to provide a class_path, in some cases it is not expected to have subclasses. In this case the init args are given directly in a dictionary without specifying a class_path. This is the behavior for standard dataclasses, final classes, attrs’ define decorator, and pydantic’s dataclass decorator and BaseModel classes

    dataclass-like-classes — jsonargparse documentation

    • print_config gives c:c rightly when --params feeds nothing

    • --params feeds any classes (A/B/C)raise exception

  2. when type hint is Union[A,B,C],now I want know how to config params if isinstance(params,C) by using print_config

     def func(params:Union[A,B,C]) -> None:
         print('type is:', type(params))
         print('params is:', params)
    
     CLI(func, as_positional=False)

    we get

     (base) linux@DESKTOP:/mnt/d/projects$ python test.py --print_config 
     params: null
     (base) linux@DESKTOP:/mnt/d/projects$ python test.py --print_config --params A
     params:
       class_path: __main__.A
       init_args:
         a: a
         base: Base
     (base) linux@DESKTOP:/mnt/d/projects$ python test.py --print_config --params B
     params:
       class_path: __main__.B
       init_args:
         b: b
         base: Base
     (base) linux@DESKTOP:/mnt/d/projects$ python test.py --print_config --params C
     usage: test.py [-h] [--config CONFIG] [--print_config[=flags]] [--params.help CLASS_PATH_OR_NAME] --params PARAMS
     error: Parser key "params":
    
       - Expected a config path but C either not accessible or invalid
       - Does not validate against any of the Union subtypes
       Subtypes: (<class '__main__.A'>, <class '__main__.B'>, <class '__main__.C'>)
       Errors:
         - Expected a dot import path string: C
         - Expected a dot import path string: C
         - Type <class '__main__.C'> expects a dict or Namespace
       Given value type: <class 'str'>
       Given value: C

​ no one gives c:c config example!

Expected behavior

print_config can print params's config

Environment

mauvilsa commented 4 months ago

Thank you for reporting this issue and the detailed reproduction code!

After a look at it, I don't see any bug. The behavior is working as expected. When the type is params:C there are two ways to provide the nested parameters from command line:

When a union is used, like params:Union[A,B,C], a given value is attempted to be parsed by each type from left to right. If the given value is A (shorthand for {"class_path":"__main__.A"}), then type A parses successfully and there is no attempt on B or C. If the given value is B, then parsing with A fails, B succeeds and C is not attempted. To provide a value for C it is the same as if there were no union, i.e. as --params '{"c": "x"}' or --params.c x, in which case A and B fail to parse, and C succeeds.

Even though this is not a bug, it could be considered a feature request. This is closely related to #287. I do have in the roadmap to provide a way to choose which classes should work as subclasses and which not. With this, C would be configured to work as a subclass base, and be required to use class_path for it or its subclasses. And --params C would work as you expected.

pjgao commented 4 months ago

Thanks for your kindly reply! The main starting point of the problem is that when the dataclass type hint are nested in multiple layers, it is difficult to manually write the configuration file.

from dataclasses import dataclass
from typing import Union

from jsonargparse import CLI

@dataclass
class DataClass1:
    a: str
    b: int

@dataclass
class DataClass2:
    c: DataClass1
    d: str

@dataclass
class DataClass3:
    e: DataClass2
    f: int
  1. when type hint is: DataClass3

     def func(x: DataClass3):
         pass
    
     CLI(func)

    print_config

     (base) linux@DESKTOP-SSJTOLO:/mnt/d/projects/push$ python test.py --print_config
     x:
       e:
         c:
           a: null
           b: null
         d: null
       f: null

    😀Wow! Now I can know which params should have automatically!

  2. when type hint is: Union[str, DataClass3]

     def func(x: Union[str, DataClass3]):
         pass
    
     CLI(func)

    print_config

     (base) linux@DESKTOP-SSJTOLO:/mnt/d/projects/push$ python test.py --print_config
     x: null
     (base) linux@DESKTOP-SSJTOLO:/mnt/d/projects/push$ python test.py -h
     usage: test.py [-h] [--config CONFIG] [--print_config[=flags]] x
    
     <function func at 0x7f5d41942160>
    
     positional arguments:
       x                     (required, type: Union[str, DataClass3])
    
     optional arguments:
       -h, --help            Show this help message and exit.
       --config CONFIG       Path to a configuration file.
       --print_config[=flags]
                             Print the configuration after applying all other arguments and exit. The optional flags customizes the output and are one or more keywords separated by comma. The supported flags are: comments, skip_default, skip_null.

    😪OOps. Must write params one by one manually.

Can dataclass should work as subclasses solved this confusion?

mauvilsa commented 4 months ago

Can dataclass should work as subclasses solved this confusion?

Yes, this would solve it.

mauvilsa commented 2 months ago

Closing this in favor of #287.