Open Queuecumber opened 3 years ago
Hi @Queuecumber! There is a lot to unpack here :).
First of all, this project is still under development, with the only person doing the heavy lifting now is @romesco. We welcome help from anyone with vested interest in this project to help speed it up.
You are raising good points.
Shouldn't this be something like optimizer = hydra.utils.instantiate(cfg.adadelta, params=model.paramters)
Absolutely, I believe (@romesco, correct me if I am wrong) that the current tutorial is guiding the user incrementally and much more is panned in the subsequent sections.
Hydra 1.0 has some limitations for how Structured Configs can be used as schema for yaml files. Hydra 1.1 will be much more powerful on that area, and also around recursively instantiating objects (happy to get into details about 1.1 but I don't want to derail this comment). Since 1.1 is still work in progress, this repository is targeting Hydra 1.0 for now.
Finding a good way to leverage those configs for each of the use cases (Using the configs here in-place of yaml files and using them to validate the user's yaml files) with Hydra 1.0 is something we will need to figure.
Config registration is also something we are figuring out. We will likely provide a function to register the configs in a standard location. (but that standard location may not be suitable for some use cases with Hydra 1.0). Take a look at the structured configs tutorial in he Hydra website, in particular the last 3 pages about schema to get a feeling of the limitations in Hydra 1.0.
You are very welcome to join the effort (We have stream in the Zulip chat for this project).
OK this makes a lot of sense
I noticed after posting that the existing schema seem to cover only optimizers and schedulers. Also I was targeting Hydra next because it has so many cool features I want to use.
Recursively instantiating objects is something I would love to see, as of now I had to provide custom handlers for things like ConcatDataset
and Compose
transforms to handle that.
As a quick POC, I was able to write something the following:
import inspect
from pkgutil import iter_modules
import hydra_configs.torch
from hydra.core.config_store import ConfigStore
def register_hydra():
search_modules = [("hydra_configs.torch", hydra_configs.torch)]
classes = set()
while search_modules:
p, m = search_modules.pop()
for name, obj in inspect.getmembers(m):
if inspect.isclass(obj) and name.endswith("Conf"):
reg_name = name.replace("Conf", "").lower()
path = obj.__module__.replace("hydra_configs.torch.", "")
path = reversed(path.split("."))
group = ""
for mod in path:
if mod.lower() != reg_name:
group = mod
break
classes.add((group, reg_name, obj))
if "__path__" in m.__dict__:
for submodule in iter_modules(m.__path__, prefix=f"{p}."):
c = __import__(submodule.name, fromlist="dummy")
search_modules.append((submodule.name, c))
cs = ConfigStore.instance()
for group, name, node in classes:
cs.store(group=group, name=name, node=node)
which allows me to do:
defaults:
- optim: adam
- lr_scheduler: cosineannealinglr
without any further YAMLs and get
optim:
_target_: torch.optim.adam.Adam
params: ???
lr: 0.001
betas:
- 0.9
- 0.999
eps: 1.0e-08
weight_decay: 0
amsgrad: false
lr_scheduler:
_target_: torch.optim.lr_scheduler.CosineAnnealingLR
optimizer: ???
T_max: ???
eta_min: 0
last_epoch: -1
as the DictConfig
I have a feeling there's a better way to do it, and the heuristic to get the group name isn't great.
I still need to review the limitations of the structured configs, although I will say for my use case I don't exactly need validation it's more about convenience of having the names and default parameters defined.
If there's anything I can do to help let me know I'm happy to help get this project, as well as the lightning one, going
@Queuecumber , can you join the Hydra chat?
@Queuecumber I think your intuition about how to use this project is very well aligned with our roadmap.
@omry Yes, the idea with the tutorial series is to build up to the 'ideal use case' with incremental changes. For example, we purposefully limit the flexibility of the optimizer instantiation in the Basic
example to make it clear why one might want to use hydra.utils.instantiate
to begin with. In the Intermediate
example, this is changed out for the more flexible approach.
Let me pick up the Intermediate Tutorial and we can discuss more on that PR. Once it's done, let's plan for the next 'use case feature'. Let's discuss in more detail on the zulip chat. Just @ rosario on there if I'm AFK.
I have done something similar in a recent project using Hydra/PyTorch and I'm evaluating if it makes sense for me to switch to this (I'm trying to simplify the code by replacing as much as I can with 3rd party libraries), but I'm not entirely sure if it works for my use case.
One question I had immediately after reading the tutorial is the section on instantiating from the configs:
Shouldn't this be something like
optimizer = hydra.utils.instantiate(cfg.adadelta, params=model.paramters)
, that way the user could plug in whatever optimizer they wanted to the config? (Would love @omry 's feedback on this too because maybe I'm misunderstanding it). Or more flexibly:So when I did this I had a bunch of YAMLs that defined the various options like (folder stucture):
Then the user can put in their experiment config:
and then my code uses
hydra.utils.instantiate
to make whatever the user wants.So what I would really love to do is replace all the yaml files I wrote with the configs from this repo and keep only the experiment configs. Is this possible to do?
One issue I see with that is that I would need to register, potentially, all of the possible configs this project provides.