Closed haimat closed 1 year ago
@haimat I think we have an internal ticket for this, it's being worked on. We'll post an update here.
For now, could you please try to add a commit with the newly generated dvc.yaml
and metrics? (so that Studio has it as a base with all the columns, etc).
newly generated dvc.yaml and metrics
To clarify, this refers to what is inside your dvclive
folder
@shcheklein @daavoo These files are already in Git, from my local experiments:
Also a re-import of the project in Studio did not help :(
@haimat What dvclive version do you have? Do you set dvcyaml=False
? There should be a dvc.yaml
file inside your dvclive folder, and that is what tells Studio (and all other DVC-supported tools) what to show as metrics, plots, etc.
@dberenbaum There is indeed no such /dvclive/dvc.yaml
file - but in the first project where DVCLive works well with studio, there is such a file. But I can't recall creating that manually. How can I tell DVCLive to create this file for me?
You can call live.make_dvcyaml()
, but it should be getting created automatically. Could you provide an example of your code? It can be heavily redacted as long as it includes the DVCLive parts.
@dberenbaum Sure, here you go - this is our model training script that uses DVCLive, stripped of all non-DVCLive-related sections:
from dvclive import Live
def main():
"""Train the YOLO model."""
DVC.log_params(params["training"])
model = YOLOv8(...)
model.train(...)
def yolo_cb_model_save(trainer):
"""YOLOv8 callback - Training and validation epoch finished."""
metrics = get_yolo_trainer_metrics(yolo_trainer)
for metric_name, value in metrics.items():
DVC.log_metric(metric_name, value)
DVC.next_step()
def yolo_cb_train_end(trainer):
"""YOLOv8 callback - Model training and validation has ended."""
for f in os.listdir(train_folder):
img_path = os.path.join(train_folder, f)
if os.path.isfile(img_path) and (img_path.endswith(".jpg") or img_path.endswith(".png")):
img = Image.open(img_path).convert("RGB")
DVC.log_image(f, img)
if __name__ == "__main__":
if os.path.isfile("params.yaml"):
with open("params.yaml", "r") as stream:
params = yaml.safe_load(stream)
with Live(report="md") as DVC:
main()
sys.exit(0)
else:
raise RuntimeError("No such file params.yaml")
Thanks @haimat! I don't see anything obvious in that script. What do you get from pip freeze | grep dvclive
?
Thanks @haimat! I don't see anything obvious in that script. What do you get from
pip freeze | grep dvclive
?
@dberenbaum We are running via CML on our own Github runner, so always the latest version of DVCLive. But as said there is no dvc.yaml file under dvclive.
How do you commit the experiment? Is it possible that this file is generated but doesn't get committed?
@dberenbaum Hmm... it's really strange. In my first DVC project, there I got this /dvclive/dvc.yaml
file, and that project I an monitor in Studio quite fine. But I don't know why this file has not been generated in the new project.
What is the supposed way to create that file - our should that come automatically at the first run of DVCLive?
What is the supposed way to create that file - our should that come automatically at the first run of DVCLive?
It should come automatically, but you can also force creation of it by calling DVC.make_dvcyaml()
.
@dberenbaum We are running via CML on our own Github runner, so always the latest version of DVCLive. But as said there is no dvc.yaml file under dvclive.
How are you checking in GitHub that the file is/isn't generated? Can you try to run locally or outside of GitHub? It may be easier to debug that way.
It should come automatically, but you can also force creation of it by calling
DVC.make_dvcyaml()
.
Thanks, I will put that somewhere into our code to make sure the file is being created in the future.
How are you checking in GitHub that the file is/isn't generated? Can you try to run locally or outside of GitHub? It may be easier to debug that way.
We always run the experiment locall (= outside of GitHub) first, but the file has not been generated there either.
I don't get it, there is still a bug. Even though I explicitly call dvc_live.make_dvcyaml()
now at every training epoch, I still don't see the metrics in Studio. In the first project, where everything works, I get this selection in Studio:
However, in the new project I only get this in Studio:
Where are the metrics from the live experiment, how can I add them to Studio? Without them I don't have much from the "Live" part in "DVCLive" 😉
Are you getting the dvclive/dvc.yaml
file generated at least?
If so, it may be a bug that we are currently working to fix. Does the parent commit that's already pushed to GitHub have all those dvclive
outputs included (edit: does it have both the dvclive/dvc.yaml
file and metrics in dvclive/metrics.json
)?
@dberenbaum I have manually added the dvclive/dvc.yaml
file as copy from the other project, but that also didn't help. It seems the dvclive/metrics.json
file is missing - but I cannot tell, since it's running on the GitHub runner.
@haimat Could you please take a look at the parent commit that is used to kick off the GitHub runner rather than anything generated by the GitHub runner? Does that parent commit have both dvclive/dvc.yaml
and dvclive/metrics.json
?
@dberenbaum Interesting ... from a previous commit there was a valid dvclive/metrics.json
file, it looked like this:
{
"map95": 0.20235,
"map50": 0.36208,
"precision": 0.64727,
"recall": 0.33933,
"step": 0
}
Then from another commit after a local experiment I got an update where everything was the same, just the metrics/values changed. However, now after the last PR from your CML bot this file has been emptied on GitHub:
{}
Might this be related to our issue here? The CML command in our GitHub workflow looks like this:
cml pr --squash --skip-ci .
Called from the root folder.
That should include the whole /dvclive
folder, right?
That should include the whole
/dvclive
folder, right?
Yes it should.
this file has been emptied on GitHub
So dvclive/metrics.json
exists but is empty?
@dberenbaum Well, as said - it was filled correctly, but after the CML commit it was empty. Why would it ever become empty, or is this a bug?
If the file exists but is empty in the CML commit, it is likely happening in DVCLive. CML or other components would be unlikely to modify the contents of that file. What else is in the dvclive
folder in that commit?
@dberenbaum In the last commit from CML, which emptied the metrics.yaml
file, these where all the files that have been touched:
In the last commit, which triggered the GitHub training via CML, only the project's main params.yaml
file has been updated with the new training parameters.
Okay, so everything looks fine to me except that dvclive/metrics.json
is empty, which explains why Studio isn't showing the metrics (it would be reading the values in that file).
I'm guessing that for some reason yolo_cb_model_save
isn't being called. If you comment out or drop the DVCLive logging in that part of the callback, I think it will give you similar results to what you are seeing in that CML commit.
@dberenbaum I don't understand, why do you think that yolo_cb_model_save
isn't being called? If that would be the case, then also some other steps in that callback would not have been performed - but they have, so this callback is being called for sure.
What exactly do you recommend? Is this a bug in DVCLive, or am I calling it wrong?
@dberenbaum I don't understand, why do you think that
yolo_cb_model_save
isn't being called? If that would be the case, then also some other steps in that callback would not have been performed - but they have, so this callback is being called for sure.
If log_metric
or next_step
were called, I wouldn't expect an empty dvclive/metrics.json
file, so I thought that might be the problem.
What exactly do you recommend? Is this a bug in DVCLive, or am I calling it wrong?
With only the info from your GitHub Actions workflow, it's hard for me to say what to recommend since I still don't know the cause and can't reproduce it. Are there any notable changes to your DVCLive code from what you posted above?
Also, could you make a couple changes to your workflow to debug and then post the logs after you run it?
DVCLIVE_LOGLEVEL=DEBUG
-vv
like dvc exp run -vv
Closing this, a lot of things has changed, hopefully it's fixed now. We have also a YOLO official call back and documentation. @haimat if you have time - please check it out and try again.
I have a project "A" that uses DVCLive to update Studio during training - this works fine. Now I copied over that whole project, let's call it "B", and added it to Studio as well. However, even though the
dvclive
folder has been created in "B" when I first ran the experiment, none of the metrics from the DVCLive section are shown in Studio.I can see the live experiment running in Studio, including the correct parameters for it. It's just that the DVCLive columns are missing - and thus any live updates from the training. I looked into the columns selection for that project in Studio, but also there the DVCLive metrics don't show up at all.
How can I add these metrics to Studio?