Load dependencies - Githubissues

kalekundert commented 3 years ago

Sometimes, it would be convenient to specify that one config should be loaded after another. For example, you could have an app where the path to a config file is specified from the command line, and the config associated with that file shouldn't be loaded until after the command line.

I don't know that this is really an issue anymore, since the order in which configs are consulted can now be set on a per-param basis (not sure if that was the case when I opened this issue). Here's how you'd handle the CLI arg/file path example above:

Create a parameter for the file path that gets its value from the command line, and maybe other sources.
Create other parameters that depend of the file
When one of the file-dependent parameters is accessed:
- If the command line has already been loaded, the path may come from there.
- Otherwise, the path may come from another source.
- If no paths are available, the file config won't generate a value and will indicate what happened in its logs.
If the CLI config is loaded after the file config, the file config will need to be reloaded. Maybe it would be useful to have a reload trigger for this case, but even that wouldn't generally be necessary, because loading is typically done before any attribute access.

The real point of loading things is to hide configs that aren't wanted. For example, command-line parsers need to be explicitly enabled, otherwise the whole class would never be useable in any other script (as the command line argument would be different, and the class would crash trying to load them).

Note that an alternative to this is to have the configs silently fail if they can't load themselves. But this can hide errors that we want to see. For example, if we explicitly enable DocoptConfig, we want to see if there was an error parsing our docstring.

So what then is the use case for having load dependencies? The purpose of loading a config is basically to say: this config makes sense in whatever context I'm currently in. These kinds of declarations are already well-served by the current syntax, and are not prone to inter-config dependencies. I can't think of a case where one config should imply another.

I'm going to close this issue. I think it was based on an old paradigm, and I can no longer think of any use cases for it.

Actually, I think my original application was a good example of a real problem. I just ran into it again: I was making an app that used DocoptConfig to get a path to a JSON file from the command-line, then used JsonConfig to get information from that file. JsonConfig autoloads by default and DocoptConfig doesn't, so there are only two ways to get this to work:

Manually call reload(JsonConfig) after load(DocoptConfig). The problem with this approach is action-at-a-distance: the dependency between JsonConfig and DocoptConfig is expressed in __config__ (because JsonConfig.setup(path_getter=...) depends on a parameter set by DocoptConfig), but the handling of this dependency is in main.
Set JsonConfig.autoload = False, put DocoptConfig after JsonConfig in __config__, then load() both at the same time. The problem with this approach is that it's complicated, and still has a weaker form of the action-at-a-distance issue from above.

Both of these solutions have drawbacks, and neither would be obvious to a casual user. A reload trigger (which I mentioned above) would be much better, because it would clearly and succintly specify what needs to happen. I also want to emphasize that the focus here is dependencies in the load process. These sorts of dependencies will typically be expressed in the __config__ variable. This has nothing at all to do with the order in which configs are used by parameters.

I also think I should load configs in the order they appear in __config__, rather than the reverse of that. I know I reversed it so that the "most important" configs could come first, but I'm starting to find it easier to think of __config__ as simply being the order configs will be loaded in. And it was never really "most important", it was "most dependent", which is a different thing. So the "most important" mindset is kind-of an incorrect crutch to begin with.

On the need for reload triggers but not load triggers: loading a config says, "I'm running this code in a context where the given config is meaningful". I can't really see a use for triggers in this sense. Reloading a config says, "something that could affect how this config works has changed, and I need to account for that". The use for triggers is much more clear in that sense.

kalekundert commented 3 years ago

Possible syntax:

class ConfigA:
    # A list of functions that all must return True in order for the config to 
    # be loaded.
    load_deps = [
        after_explicit,
    ]

class ConfigB:
    load_deps = [
        after(ConfigA),
    ]

def after_explicit(obj, explicit):
    return explicit  # i.e. was `load()` called explicitly, or is this `init()`.

def after(config_cls):
    def f(obj, explicit):
        for bc in model.get_bound_configs(obj):
            if isinstance(bc.config, config_cls):
                return bc.is_loaded
    return f

kalekundert commented 3 years ago

It would also be useful to be able to specify that a config should be loaded upon being accessed for the first time.

In general, configs have to be loaded immediately, because they're allowed to access other attributes of the object while they're loading. If the loading is deferred until first access, the object may have configs defined that shouldn't be yet.

In principle this shouldn't be a problem as long as the config doesn't actually access any attributes of the object during loading. Perhaps I could add an "ok to defer" flag to the config class. But most classes do access some attributes (and there's no way to say a priori which attributes are "safe"), and the benefit of this feature would be pretty marginal, so overall I don't think it's worth implementing.

kalekundert commented 3 years ago

Or that a config should be loaded upon accessing a certain parameter.

kalekundert commented 2 years ago

Another possible syntax. Pretty similar to above, but with some cosmetic tweaks:

class MyObj:
    __config__ = [
        ConfigA.setup(load_trigger=None),
        ConfigB.setup(load_trigger=manual),
        ConfigC.setup(load_trigger=ConfigB),
        ConfigD.setup(load_trigger=[ConfigB, ConfigC]),
    ]

Some notes:

The load_trigger setting could also be specified as a class variable, but I expect that it will be more often useful to specify it in the object itself.
The trigger can have the following values:
- falsey: load immediately.
- callable: gets list of wrapped configs, returns bool indicating whether or not to load.
- manual is a callable that always return False
- Config: load once all configs of the given class have been loaded.
- I can dramatically simplify the code by only counting configs that appear prior to the one in question in the load order. This makes some sense, as later configs can't be accessed during load() anyways. But configs can be checked in different orders in the params themselves, so this might not always work.
- iterable: values may be any of the above, all must be True in order to load.
This blurs the line between init() and load(). One thing that makes init() unique is that it only selects configs with autoload=True. This isn't unsurmountable, though. I could redefine "autoload" as "triggered with no other configs", and select such configs by passing an empty wrapped configs list.

kalekundert / byoc

Load dependencies #11