Open vedal opened 1 year ago
Sorry for all the questions. Please feel free to ignore it.
No worries. Ask as much as you need. Anyway the questions might be helpful for other people too. And might lead to improvements in the code or the documentation.
wonder if its possible to define, and then override/extend, yaml defaults inside other yamls.
Not possible in a single config file. There are several possibilities:
--config=defaults.yaml --config=config.yaml
--data=data_defaults.yaml --data.batch_size=4
--data=data_defaults.yaml --data=data.yaml
defaults.yaml
to default_config_files
and any override command line arguments without needing to specify the default configSome time ago I started a branch to improve the documentation regarding the overrides but did not finish it. What I had written is:
Override order
--------------
Final parsed values depend on different sources, namely: source code, command
line arguments, :ref:`configuration-files` and :ref:`environment-variables`.
Values are overridden based on the following precedence:
1. Defaults defined in the source code.
2. Existing default config files in the order defined in
``default_config_files``, e.g. ``~/.config/myapp.yaml``.
3. Full config environment variable, e.g. ``APP_CONFIG``.
4. Individual key environment variables, e.g. ``APP_OPT1``.
5. Command line arguments in order left to right (might include config files).
Depending on the parse method used (see :class:`.ArgumentParser`) and how the
parser was built, some of the options above might not apply. Parsing of
environment variables must be explicitly enabled, except if using
:py:meth:`.ArgumentParser.parse_env`. If the parser does not have an
:class:`.ActionConfigFile` argument, then there is no parsing of a full config
environment variable or a way to provide a config file from command line.
Why do you want to have the input to the cli in a single config? What value does it bring?
In my view there isn't a difference with respect to specifying a couple of command line arguments. What I do find important is that once an experiment was run, know what had been run, i.e. automatic logging of the config like in https://pytorch-lightning.readthedocs.io/en/latest/cli/lightning_cli_advanced.html#automatic-save-of-config. But this should not depend on having a single input config file.
Currently jsonargparse.CLI
does not provide a way to implement an automatic save of config. It can only be done by manually creating and running the parser.
@vedal did my comment answer your questions? Can this be closed now?
@mauvilsa yes this definitely was a thorough answer to my question, and I appreciate it alot.
Why do you want to have the input to the cli in a single config? What value does it bring?
I did not answer your question however, as I needed to work a bit with the CLI to see how it would fit me.
In my current setup, I run exploratory experiments with different models and datasets. configs are divided as follows
I also like to use the default Lightning folder structure for storing checkpoints, logs and hparams. For this, I need to override the logger experiment name in a config file (lets call it "experiment1.yaml") in the following way:
trainer:
logger:
init_args:
name: experiment_name
along with some small changes to data_i and model_i, which could also go in experiment1.yaml
model:
batch_size: 4
data:
batch_size: 4
So, in the end, I'd end up using 4-5 configs for each experiment:
The alternative I imagined, where I also keep track of which experiments are related (they have the same experiment yaml) is the following:
trainer:
logger:
init_args:
name: experiment_name
data: data_i.yaml
model: model_i.yaml
model:
batch_size: 4
data:
batch_size: 4
Phew, that was super-long and probably boring, I'm sorry about that. You're right that it can all be defined on the CLI (except probably the experiment name hack; its a pain!). Your suggestion might well be the simplest way to solve this, without making the mess that hydra-configs can become (with overrides left and right).
I think what I struggle with is the separation of an experiment_name from the dataset/model choice that went into it.
Maybe an option for keeping track of which parameters went into each experiment, without checking each output yaml individually, is to log them to tensorboard as hparams... I haven't tried that yet.
One question however: do you usually always print_config before every experiment, as a kind of "--dry-run" to check that all hparams are ok?
Anyway, thank you again for making the best config system i've been able to find.
Same here, really looking forward to a way that could "inherit" from another config file and modify the arguments, or even compose multiple config files inside a single config file (like the example of Hydra provided in this issue).
If you are new to jsonargparse it is best if you familiarize yourself with its override-order. Just because in hydra something can be done in a certain way, it does not mean that the same should be implemented in jsonargparse. There is no point in adding yet another way to do something that already has alternatives. For a new feature to be added, a compelling motivation should be clear, and currently there isn't, in my view.
@function2-llx in your case, why must it be a single config that "inherits" other configs? The example in the description is the same as doing cli.py --data=data_defaults.yaml --data=data.yaml
or the already mentioned alternatives. The point of a CLI is that you provide arguments to it. Not much reason why to limit yourself to a single config argument.
Also note that using a config for a group of settings inside another config is possible, e.g.
data: data_i.yaml
model: model_i.yaml
Though, without the need of that additional config, from command line it would be the same as
cli.py --data=data_i.yaml --model=model_i.yaml
do you usually always print_config before every experiment, as a kind of "--dry-run" to check that all hparams are ok?
I do use --print_config
extensively. Though mostly for debugging. Not every time before running a command. Though, this is partly because lately I mostly enable other people to run experiments, not me running them. I do check what other people do, but for that the automatic save of the config that LightningCLI
has is enough. Anyway, my impression is that other people do commonly use print_config.
@rusmux as explained here, multiple arguments is the alternative. From what I understand the only motivation is that sometimes it is inconvenient to use multiple arguments. This is a valid motivation. Though note that this feature is considerable complex which might make this motivation not enough.
Sorry for all the questions. Please feel free to ignore it.
I wonder if its possible to define, and then override/extend, yaml defaults inside other yamls. This is supported in Hydra through default configs.
The reason I'd like to have this is to have an entire experiment defined inside a yaml, both the default and "override" values. An alternative would be specifying several
--config
on the command line.Example: Say I want to use most values from
default_data_config.yaml
but changebatch_size
. In Hydra, I'd write the following experiment config:This way,
data
will be populated by defaults, andbatch_size
will be overridden. The "__self__" notation on the bottom of the list means the configs in the current file override defaults.I noticed that referencing yamls inside other yamls is supported in jsonargparse, but could find anything about overrides.