iterative / studio-support

❓ DVC Studio Issues, Question, and Discussions
https://studio.iterative.ai
16 stars 1 forks source link

DVCLive metrics don't show up in Studio's column selection #85

Closed haimat closed 1 year ago

haimat commented 1 year ago

I have a project "A" that uses DVCLive to update Studio during training - this works fine. Now I copied over that whole project, let's call it "B", and added it to Studio as well. However, even though the dvclive folder has been created in "B" when I first ran the experiment, none of the metrics from the DVCLive section are shown in Studio.

I can see the live experiment running in Studio, including the correct parameters for it. It's just that the DVCLive columns are missing - and thus any live updates from the training. I looked into the columns selection for that project in Studio, but also there the DVCLive metrics don't show up at all.

How can I add these metrics to Studio?

shcheklein commented 1 year ago

@haimat I think we have an internal ticket for this, it's being worked on. We'll post an update here.

For now, could you please try to add a commit with the newly generated dvc.yaml and metrics? (so that Studio has it as a base with all the columns, etc).

daavoo commented 1 year ago

newly generated dvc.yaml and metrics

To clarify, this refers to what is inside your dvclive folder

haimat commented 1 year ago

@shcheklein @daavoo These files are already in Git, from my local experiments:

image

Also a re-import of the project in Studio did not help :(

dberenbaum commented 1 year ago

@haimat What dvclive version do you have? Do you set dvcyaml=False? There should be a dvc.yaml file inside your dvclive folder, and that is what tells Studio (and all other DVC-supported tools) what to show as metrics, plots, etc.

haimat commented 1 year ago

@dberenbaum There is indeed no such /dvclive/dvc.yaml file - but in the first project where DVCLive works well with studio, there is such a file. But I can't recall creating that manually. How can I tell DVCLive to create this file for me?

dberenbaum commented 1 year ago

You can call live.make_dvcyaml(), but it should be getting created automatically. Could you provide an example of your code? It can be heavily redacted as long as it includes the DVCLive parts.

haimat commented 1 year ago

@dberenbaum Sure, here you go - this is our model training script that uses DVCLive, stripped of all non-DVCLive-related sections:

from dvclive import Live

def main():
    """Train the YOLO model."""
    DVC.log_params(params["training"])
    model = YOLOv8(...)
    model.train(...)

def yolo_cb_model_save(trainer):
    """YOLOv8 callback - Training and validation epoch finished."""
    metrics = get_yolo_trainer_metrics(yolo_trainer)
    for metric_name, value in metrics.items():
        DVC.log_metric(metric_name, value)
    DVC.next_step()

def yolo_cb_train_end(trainer):
    """YOLOv8 callback - Model training and validation has ended."""
    for f in os.listdir(train_folder):
        img_path = os.path.join(train_folder, f)
        if os.path.isfile(img_path) and (img_path.endswith(".jpg") or img_path.endswith(".png")):
            img = Image.open(img_path).convert("RGB")
            DVC.log_image(f, img)

if __name__ == "__main__":
    if os.path.isfile("params.yaml"):
        with open("params.yaml", "r") as stream:
            params = yaml.safe_load(stream)
            with Live(report="md") as DVC:
                main()
            sys.exit(0)
    else:
        raise RuntimeError("No such file params.yaml")
dberenbaum commented 1 year ago

Thanks @haimat! I don't see anything obvious in that script. What do you get from pip freeze | grep dvclive?

haimat commented 1 year ago

Thanks @haimat! I don't see anything obvious in that script. What do you get from pip freeze | grep dvclive?

@dberenbaum We are running via CML on our own Github runner, so always the latest version of DVCLive. But as said there is no dvc.yaml file under dvclive.

dberenbaum commented 1 year ago

How do you commit the experiment? Is it possible that this file is generated but doesn't get committed?

haimat commented 1 year ago

@dberenbaum Hmm... it's really strange. In my first DVC project, there I got this /dvclive/dvc.yaml file, and that project I an monitor in Studio quite fine. But I don't know why this file has not been generated in the new project.

What is the supposed way to create that file - our should that come automatically at the first run of DVCLive?

dberenbaum commented 1 year ago

What is the supposed way to create that file - our should that come automatically at the first run of DVCLive?

It should come automatically, but you can also force creation of it by calling DVC.make_dvcyaml().

@dberenbaum We are running via CML on our own Github runner, so always the latest version of DVCLive. But as said there is no dvc.yaml file under dvclive.

How are you checking in GitHub that the file is/isn't generated? Can you try to run locally or outside of GitHub? It may be easier to debug that way.

haimat commented 1 year ago

It should come automatically, but you can also force creation of it by calling DVC.make_dvcyaml().

Thanks, I will put that somewhere into our code to make sure the file is being created in the future.

How are you checking in GitHub that the file is/isn't generated? Can you try to run locally or outside of GitHub? It may be easier to debug that way.

We always run the experiment locall (= outside of GitHub) first, but the file has not been generated there either.

haimat commented 1 year ago

I don't get it, there is still a bug. Even though I explicitly call dvc_live.make_dvcyaml() now at every training epoch, I still don't see the metrics in Studio. In the first project, where everything works, I get this selection in Studio:

image

However, in the new project I only get this in Studio:

image

Where are the metrics from the live experiment, how can I add them to Studio? Without them I don't have much from the "Live" part in "DVCLive" 😉

dberenbaum commented 1 year ago

Are you getting the dvclive/dvc.yaml file generated at least?

If so, it may be a bug that we are currently working to fix. Does the parent commit that's already pushed to GitHub have all those dvclive outputs included (edit: does it have both the dvclive/dvc.yaml file and metrics in dvclive/metrics.json)?

haimat commented 1 year ago

@dberenbaum I have manually added the dvclive/dvc.yaml file as copy from the other project, but that also didn't help. It seems the dvclive/metrics.json file is missing - but I cannot tell, since it's running on the GitHub runner.

dberenbaum commented 1 year ago

@haimat Could you please take a look at the parent commit that is used to kick off the GitHub runner rather than anything generated by the GitHub runner? Does that parent commit have both dvclive/dvc.yaml and dvclive/metrics.json?

haimat commented 1 year ago

@dberenbaum Interesting ... from a previous commit there was a valid dvclive/metrics.json file, it looked like this:

{
    "map95": 0.20235,
    "map50": 0.36208,
    "precision": 0.64727,
    "recall": 0.33933,
    "step": 0
}

Then from another commit after a local experiment I got an update where everything was the same, just the metrics/values changed. However, now after the last PR from your CML bot this file has been emptied on GitHub:

image

{}

Might this be related to our issue here? The CML command in our GitHub workflow looks like this:

cml pr --squash --skip-ci .

Called from the root folder. That should include the whole /dvclive folder, right?

dberenbaum commented 1 year ago

That should include the whole /dvclive folder, right?

Yes it should.

this file has been emptied on GitHub

So dvclive/metrics.json exists but is empty?

haimat commented 1 year ago

@dberenbaum Well, as said - it was filled correctly, but after the CML commit it was empty. Why would it ever become empty, or is this a bug?

dberenbaum commented 1 year ago

If the file exists but is empty in the CML commit, it is likely happening in DVCLive. CML or other components would be unlikely to modify the contents of that file. What else is in the dvclive folder in that commit?

haimat commented 1 year ago

@dberenbaum In the last commit from CML, which emptied the metrics.yaml file, these where all the files that have been touched:

image

In the last commit, which triggered the GitHub training via CML, only the project's main params.yaml file has been updated with the new training parameters.

dberenbaum commented 1 year ago

Okay, so everything looks fine to me except that dvclive/metrics.json is empty, which explains why Studio isn't showing the metrics (it would be reading the values in that file).

I'm guessing that for some reason yolo_cb_model_save isn't being called. If you comment out or drop the DVCLive logging in that part of the callback, I think it will give you similar results to what you are seeing in that CML commit.

haimat commented 1 year ago

@dberenbaum I don't understand, why do you think that yolo_cb_model_save isn't being called? If that would be the case, then also some other steps in that callback would not have been performed - but they have, so this callback is being called for sure.

What exactly do you recommend? Is this a bug in DVCLive, or am I calling it wrong?

dberenbaum commented 1 year ago

@dberenbaum I don't understand, why do you think that yolo_cb_model_save isn't being called? If that would be the case, then also some other steps in that callback would not have been performed - but they have, so this callback is being called for sure.

If log_metric or next_step were called, I wouldn't expect an empty dvclive/metrics.json file, so I thought that might be the problem.

What exactly do you recommend? Is this a bug in DVCLive, or am I calling it wrong?

With only the info from your GitHub Actions workflow, it's hard for me to say what to recommend since I still don't know the cause and can't reproduce it. Are there any notable changes to your DVCLive code from what you posted above?

Also, could you make a couple changes to your workflow to debug and then post the logs after you run it?

shcheklein commented 1 year ago

Closing this, a lot of things has changed, hopefully it's fixed now. We have also a YOLO official call back and documentation. @haimat if you have time - please check it out and try again.