`dvclive.log`: Support to remap given `name` to `target`

iterative / dvclive

📈 Log and track ML metrics, parameters, models with Git and/or DVC

https://dvc.org/doc/dvclive

Apache License 2.0

164 stars 36 forks source link

`dvclive.log`: Support to remap given `name` to `target` #121

Closed daavoo closed 1 year ago

daavoo commented 3 years ago

It could be convenient, not sure how much, to let the user configure a remap from the names passed to dvclive.log to some given targets. Internally, could be easily implemented.

For example (not sure how to configure it, just using dvclive.init here to show the idea) the following code:

import dvclive

dvclive.init(remap={"acc": "accuracy", "val/accuracy": "validation/accuracy"})

dvclive.log("acc", 0.9)

Will actually generate dvclive/accuracy.tsv instead of dvclive/acc.tsv (current behavior).

Not sure how much relevant is this use case but I found the need when working on a repository where different ML Frameworks are used for training on different branches . Having this remap functionality would allow to more easily compare metrics generated by different dvclive<>ML Framework integrations.

In addition, sometimes I just don't like the names chosen by the ML Framework and ended adding this remap myself on a subsequent stage.

pared commented 3 years ago

I like that idea! @dberenbaum what do you think?

dberenbaum commented 3 years ago

I don't see any issue with it. I'm not sure how valuable it is, since switching ML frameworks seems more likely to be an issue for internal testing than for most users, but having control over metrics names seems nice to have.

Does this belong in dvclive.init() or in the integration? Would you ever use it like the above example where you manually call dvclive.log("acc", 0.9) but actually want a different name saved?

pared commented 3 years ago

@dberenbaum I guess this is more for the integrations, where sometimes you have predefined names (like keras). Or am I wrong here @daavoo?

daavoo commented 3 years ago

@dberenbaum I guess this is more for the integrations, where sometimes you have predefined names (like keras). Or am I wrong here @daavoo?

The use case would be definitely focused on integrations, given that if you own the code with explicit calls to dvclive.log you could just change the name in the call.

What I'm not sure about is whether the feature should be implemented on the dvclive side (adding the arg to dvclive.init) or on each integration (adding the arg to DVCLiveCallback).

daavoo commented 3 years ago

I think that, for this kind of small features, it would be more beneficial to have the implementations on the dvclive side so we don't depend on sending P.R. to each "external integration" repository.

dberenbaum commented 3 years ago

Makes sense. The reason I raised the question is because dvclive.init() already has several arguments, and it's easy to keep adding more. How can dvclive functionality continue to be extended without weighing down dvclive.init()?

pared commented 3 years ago

@dberenbaum I would leave visible the most important parameters (eg init path) and put rest into kwargs and provide more info in docs/doscstrings for the method.

dberenbaum commented 3 years ago

Sounds good. I don't have any objection to this idea, just something to keep in mind.

daavoo commented 3 years ago

Opened #128 for continuing the discussion about arguments and leaving this issue for discussing the remap feature.