Open macio232 opened 2 years ago
This would be amazing for my current project.
Hey, have you considered using hydra? DVC has Hydra integration that should handle many of the use cases for Gin. Usually we try not to be opinionated about tooling, but in this case it was too hard to have meaningful support for complex configuration while being framework-agnostic.
gin
is different enough from hydra
, through its direct binding of configs to code, and with the ability to pass configurable references, that moving between these frameworks is a pretty significant lift. For the same reason, I fully understand the challenges in supporting multiple tools like this from the DVC side.
What I would really like to be able to do, at a minimum, is simply track certain gin
config params associated with an experiment in Iterative Studio. This would allow filtering to comparable experiments. My two thoughts on this were:
params
file as the first stage of a pipeline. Don't think this will work because DVC doesn't seem to support writing of params in the middle of the pipelinemetrics
(which they aren't really, but would at least allow tracking). Don't think this will work either because DVC only supports numeric metrics (at least according to the docs, haven't tried it myself).Otherwise, I think I'm left wrapping the whole pipeline with a script outside of the DVC CLI, where the params are parsed and dumped to a file prior to running dvc repro
. Am I missing any other ways to do this?
Thanks for the clarification!
- extract a dict of relevant configurables from the gin config and dump to yaml as a
params
file as the first stage of a pipeline. Don't think this will work because DVC doesn't seem to support writing of params in the middle of the pipeline
This should be possible using "top-level" params like this:
stages:
dump_gin:
cmd: python dump_gin.py
deps:
- dump_gin.py
outs:
- gin_params.yaml
train:
cmd: python train.py
deps:
- gin_params.yaml
outs:
- model.pkl
params:
- gin_params.yaml
You could also log parameters directly from your code with dvclive.log_param().
2. write out the same extracted params as
metrics
(which they aren't really, but would at least allow tracking). Don't think this will work either because DVC only supports numeric metrics (at least according to the docs, haven't tried it myself).
Support for string metrics will be available next release, but it seems like option 1 is closer to what you want.
Thanks @dberenbaum, that's exactly the approach I tried out after posting the other day. It seems to work locally, but I'm not picking up anything besides the default params.yaml
in Studio yet.
I'm wondering if this is just because these changes are in a feature branch that I haven't yet merged to the default branch? I can't tell how exactly Studio picks up available params/metrics columns when it imports the project, but perhaps it relies on the default branch?
My dvc.yaml
also specifies a directory of params instead of the single file example you give above, since I'm splitting out params by stage. Should I expect Studio to parse this correctly?
My
dvc.yaml
also specifies a directory of params instead of the single file example you give above, since I'm splitting out params by stage. Should I expect Studio to parse this correctly?
No, a directory of params isn't supported now unfortunately. I can open an issue to track it. Are you able to to try it tracking the individual params files?
Ah, got it, I will try with individual files. Thanks for opening #9452. A little more documentation around these aspects of Studio integration would be nice. I also found by trial and error that a directory of plots is supported, but not stage level plots, only top level.
Ah, got it, I will try with individual files. Thanks for opening #9452. A little more documentation around these aspects of Studio integration would be nice.
Yup, you are right about that. In this case top-level params directories aren't supported in DVC either, and we need to clarify that. If it's supported in DVC, it should work in Studio.
I also found by trial and error that a directory of plots is supported, but not stage level plots, only top level.
Stage-level plots should be supported in Studio, so if you have more details, we can look into it.
Stage-level plots should be supported in Studio, so if you have more details, we can look into it.
This was for a directory of .png plots. dvc plots show
would pick them up just fine when listed under a stage, but they never showed up in Studio until I moved them to a top level plots entry.
That looks to me to be working when trying out a simple example. If it's not working for you and you can reproduce it, could you please open an issue in https://github.com/iterative/studio-support/issues?
This should be possible using "top-level" params like this:
stages: dump_gin: cmd: python dump_gin.py deps: - dump_gin.py outs: - gin_params.yaml train: cmd: python train.py deps: - gin_params.yaml outs: - model.pkl params: - gin_params.yaml
Finally figured out that I needed to add cache: false
and git track gin_params.yaml
for this to work. dvc params diff
doesn't seem to be able to identify params files in the cache?
Thanks for your patience in debugging the problem. I can confirm that behavior. I'll open a separate bug report for it.
It would be nice to use DVC along with https://github.com/google/gin-config to handle parameters.