mit-ll-responsible-ai / hydra-zen

Create powerful Hydra applications without the yaml files and boilerplate code.
https://mit-ll-responsible-ai.github.io/hydra-zen/
MIT License
339 stars 15 forks source link

Infer from type-annotations for auto-config support #294

Open addisonklinke opened 2 years ago

addisonklinke commented 2 years ago

When builds(..., populate_full_signature=True) is used on an object that expects recursive instantiation, the example config from a script's --help doesn't provide the user with any insight about the nested fields (and their potential defaults). To demonstrate, I've adapted this Hydra example to use Zen for the config creation

import hydra
from hydra.core.config_store import ConfigStore
from hydra.utils import instantiate
from hydra_zen import builds
from omegaconf import DictConfig

class Driver:
    def __init__(self, name: str = 'Bob', age: int = 16) -> None:
        self.name = name
        self.age = age

class Car:
    def __init__(self, driver: Driver):
        self.driver = driver

    def drive(self) -> None:
        print(f'Car driven by {self.driver.name}')

@hydra.main(config_path=None, config_name='config')
def my_app(cfg: DictConfig) -> None:
    car = instantiate(cfg)
    car.drive()

if __name__ == '__main__':

    cs = ConfigStore.instance()
    cs.store(name='config', node=builds(Car, populate_full_signature=True))
    my_app()

The help output is

_target_: __main__.Car
driver: ???

The correct CLI override syntax would be something like driver='{name:Bob, age:25}', but a user cannot discern this without reading the source code which is rather unfortunate. As an alternative, I can switch the type-hint in Car to have a default like so

class Car:
    def __init__(self, driver: Driver = builds(Driver, populate_full_signature=True):

This provides a much more informative help output which lets the user see that they must provide driver.name, but driver.age is optional and defaults to 16

_target_: __main__.Car
driver:
  _target_: __main__.Driver
  name: ???
  age: 16

Would it be reasonable to propagate the populate_full_signature kwarg from builds(Car) through to the recursive Driver parameter automatically? It seems oddly redundant to have to specify it both with cs.store() and then again in the type-hints. Especially if the signature of Car required multiple nested objects

jgbos commented 2 years ago

Interesting use case. The configuration for Car cannot be instantiated because there is no default value provided for Car.

>>> instantiate(builds(Car, populate_full_signature=True))
...
MissingMandatoryValue: Missing mandatory value: driver
    full_key: driver
    object_type=Builds_Car

This is why the --help prints driver: ???.

If we provide a default for driver, driver: Driver = Driver(), then we get

>>> instantiate(builds(Car, populate_full_signature=True))
...
HydraZenUnsupportedPrimitiveError: Building: Car ..
 The configured value <__main__.Driver object at 0x000001AE702572B0>, for field `driver`, is not supported by Hydra -- serializing or instantiating this config would ultimately result in an error.

Consider using `hydra_zen.builds(<class '__main__.Driver'>, ...)` create a config for this particular value.

I'm thinking the best approach is to configure driver as well:

if __name__ == "__main__":
    cs = ConfigStore.instance()
    cs.store(
        name="config",
        node=builds(
            Car,
            driver=builds(Driver, populate_full_signature=True),
            populate_full_signature=True,
        )
    )
    my_app()

Here the output would be what you expect:

_target_: __main__.Car
driver:
  _target_: __main__.Driver
  name: Bob
  age: 16
jgbos commented 2 years ago

Oh, just playing around with this idea. I'm guessing you'd want swapable configs for Driver (and Car). So you might want to do this instead:

if __name__ == "__main__":
    cs = ConfigStore.instance()
    cs.store(
        name="default", group="car", node=builds(Car, populate_full_signature=True)
    )
    cs.store(
        name="default",
        group="car/driver",
        node=builds(Driver, populate_full_signature=True),
    )
    cs.store(
        name="config",
        node=make_config(
            hydra_defaults=["_self_", {"car": "default"}, {"car/driver": "default"}]
        ),
    )
    my_app()

Here the --help provides:

_target_: __main__.Car
driver:
  _target_: __main__.Driver
  name: Bob
  age: 16
addisonklinke commented 2 years ago

The configuration for Car cannot be instantiated because there is no default value provided for driver

To clarify, I understand why this fails from a technical standpoint. My question is more around the use-ability of displaying driver: ??? with the error output Missing mandatory value: driver. This doesn't give any hints about what kind of value(s) should be provided to satisfy driver (even though we could determine that for the user from code inspection)

Therefore I'm left with the decision to either

  1. Use driver=builds(Driver, populate_full_signature=True) as the default value. This enables a more user friendly CLI interface, but makes the code base fairly verbose since we're already using builds(Car, populate_full_signature=True)
  2. Keep the code-base concise (by avoiding repetitive builds(..., populate_full_signature=True), but this has the use-ability downsides I explained above

Neither option is completely satisfactory, so I was hoping this might spark additional brainstorming. Perhaps there is no win-win solution though, in which case I would opt for the first approach since I think use-ability is more important than conciseness

rsokl commented 2 years ago

I agree that it would be nice to have the Driver information be auto-populated in your config, and that the Pythonic way to make this happen is via type annotations.

so I was hoping this might spark additional brainstorming

Indeed you have! I had never considered leveraging type-annotations to help populate a hierarchical config (e.g. where a Driver config is a component of the larger Car config). This is could be quite useful from an auto config perspective.

Use driver=builds(Driver, populate_full_signature=True) as the default value.

I wouldn't want hydra-zen to lead people down this sort of design path, where you are designing a class that only works properly in the context of Hydra, via recursive instantiation. Something like the solution you proposed would be much nicer.

I am currently working on my write up for https://github.com/mit-ll-responsible-ai/hydra-zen/discussions/293, but I will be thinking about your proposal in the meantime, and will get back to you ASAP.

Thanks for sharing your thoughts and ideas on this!

jgbos commented 2 years ago

Sorry, my comments were mostly me thinking out loud on the issue. I was definitely intrigued and frustrated that there didn't seem to be an approach to provide configuration information about this type of interface. It's been ingrained in me that I have to provide configs.

Indeed you have! I had never considered leveraging type-annotations to help populate a hierarchical config (e.g. where a Driver config is a component of the larger Car config).

I think this would be very powerful.