martius-lab / cluster_utils

https://martius-lab.github.io/cluster_utils/
Other
8 stars 0 forks source link

Use a centrally defined data structure for settings #41

Open luator opened 10 months ago

luator commented 10 months ago

Currently there is no explicit definition of the structure of the configuration. The config file is just read to a dictionary and parts of it are even passed around as kwargs (like foo(**params["foo_settings"])). While this is quite flexible, it also makes it almost impossible to understand the complete set of parameters that can be configured.

Therefore I propose to refactor the code to load the settings into a clearly defined data structure (e.g. a dataclass or namedtuple). Advantages are:

If the request is accepted, it still needs to be decided how to implement it. One option could be to move from smart_settings to variconf (disclaimer: I wrote it), but I'm not sure if omegaconf has an import feature, so maybe that wouldn't work without loosing functionality. In the easiest case, we could stick with smart_settings and transfer everything to a NamedTuple-structure after loading.

luator commented 10 months ago

I am all for this! The kwargs make it so difficult to track all the different config settings. Bonus points if we can integrate with the command line interface and print help strings for the different settings.

On the implementation side, personally I would not be against switching away from smart settings, but this definitely needs a wider discussion. If it is based on omegaconf, imports probably could be handled with a custom resolver. There are also some config features like evals (implemented either in smart-settings or in cluster_utils) which would need to be supported in the new system.

It would probably also break backwards compatibility.

By Maximilian Seitzer on 2023-11-27T09:19:09 (imported from GitLab)

luator commented 10 months ago

Jonas suggested to use Hydra. I never used it but it's also based on omegaconf and people with ML background may already be familiar with it, so might be worth to take a look at.

By Felix Widmaier on 2023-11-28T11:55:58 (imported from GitLab)

luator commented 10 months ago

I strongly oppose Hydra. I think it would be a very bad idea to base cluster_utils on it.

(if needed, I can also write a more substantiated explanation of why I think that. The most basic version is that Hydra is not targeted at the type of application cluster_utils is.)

By Maximilian Seitzer on 2023-11-28T12:00:26 (imported from GitLab)

luator commented 8 months ago

I did a small proof of concept to confirm that the include feature can be implemented via resolvers in omegaconf:

def _include_yaml(file):
    return OmegaConf.load(file)

def main():
    OmegaConf.register_new_resolver("include_yaml", _include_yaml)

    conf = OmegaConf.load("main_cfg.yml")
    OmegaConf.resolve(conf)
    print(OmegaConf.to_yaml(conf))

Then you can use foo: "${include_yaml:extra_values.yml}" in the config file.

This only works for yaml as is, but could easily be extended to support different formats by checking the file extension.

Of course it would break compatibility with old configs, as the syntax for the includes changes.

By Felix Widmaier on 2024-01-08T12:05:22 (imported from GitLab)

mseitzer commented 5 months ago

One more UX improvement in this direction: it would be nice to reject unknown configuration values. Right now, when I mistype an optional parameter, I am not warned and my code silently executes differently than intended.

This should be fully solved by parsing the configuration into a structured format.