iterative / example-repos-dev

Source code and generator scripts for example DVC projects
https://dvc.org/doc
21 stars 13 forks source link

Update to yolo #232

Closed daavoo closed 11 months ago

daavoo commented 11 months ago

Deployed:

https://github.com/daavoo/example-get-started-experiments https://studio.iterative.ai/team/Iterative/projects/example-get-started-experiments-bhm5eg8le6


Make the decision because:


TODO:

dberenbaum commented 11 months ago

Thoughts on yolo:

A couple small Studio UI issues:

dberenbaum commented 11 months ago

Overall, I think it's great and shows off how powerful and simple the tools can be, but feels more like a showcase of functionality than a way to teach about dvc experiments. I would like to at least take some steps on https://github.com/iterative/dvclive/issues/603 if we make this our default example project. Although it's hard to maintain them all as auto-generated repos, I think we need many examples of different frameworks, so I would urge that we should try to keep both projects in some form instead of replacing one with the other.

daavoo commented 11 months ago
  • Am I reading it wrong, or are the predictions pretty bad?

Not sure how/what you are reading. I could talk for a while about metrics for these tasks 😅

The metric currently selected is an instance segmentation one (pmAP 0.5-0.95) and it doesn't hold the same meaning as accuracy or similar (like, I expect 0.9 to consider it good). It can't be even compared across datasets, but here is SOTA in COCO (still no one passed 0.6):

image

  • Not sure if the simplicity is a plus here. It is easy to read, but it's maybe too trivial? Do I really need to modularize my training code if it's 2 lines?

The point is not (only?) to modularize but to use exp run and its associated features, right?

It is automatically added as a callback if it is installed:

https://github.com/ultralytics/ultralytics/blob/main/ultralytics/utils/callbacks/dvc.py

This is the way YOLO currently handles the integrations.

  • In general, I worry there's so much magic happening in both dvc and yolo here that it's hard for the user to learn anything from it or generalize to their use case.

So, I am assuming you did not have that worry with the previous fast.ai code because the callback was passed explicitly? I remember the feedback was that it was too complicated because it used the context manager, etc.

A couple small Studio UI issues:

  • Did you manually hide all older commits in Studio? I only see the latest commit and its experiments.

Yes, feel free to add/show whatever.

  • Similarly, it looks like lots of metrics are hidden. If we want to use a more realistic example in yolo, I think it would be better to show more metrics.

Same as above, just the view I added, doesn't have to be what we show. It is what I prefer but I don't know if that is the best way to show as an example. I do worry about the fallacy of making things look "complex" for the sake of appearance.

daavoo commented 11 months ago

but feels more like a showcase of functionality than a way to teach about dvc experiments

Sorry, maybe it was already implicit in the previous comments but could you summarize the point that makes it the case for this example vs the current one?

What I got as main differences:

Is there something else?

dberenbaum commented 11 months ago

The metric currently selected is an instance segmentation one (pmAP 0.5-0.95) and it doesn't hold the same meaning as accuracy or similar (like, I expect 0.9 to consider it good).

Sorry, I was reading it wrong 😄 . I have used mAP but here I was looking at the image masks. I think it is just harder to see whether the red bounding boxes overlay an actual pool. Anyway, disregard that comment.

Sorry, maybe it was already implicit in the previous comments but could you summarize the point that makes it the case for this example vs the current one?

I don't have that strong an opinion on which one should be the "default." I think we need to keep both (and in that case, I'm fine with keeping the deployment in this one) because they showcase different features and frameworks and we need more examples.

Reasons why I wouldn't push to make this one the "default":

  1. Docs need to be reworked and I don't see that it would be worth the time.
  2. If I'm not using yolo, it doesn't show me anything about how to use dvclive. By default, I would rather have an example that invokes Live() or an explicit DVCLiveCallback().
daavoo commented 11 months ago

Sorry, I was reading it wrong 😄 . I have used mAP but here I was looking at the image masks. I think it is just harder to see whether the red bounding boxes overlay an actual pool. Anyway, disregard that comment.

Yes, the default built-in visualization is ... not great.

I don't have that strong an opinion on which one should be the "default." I think we need to keep both (and in that case, I'm fine with keeping the deployment in this one) because they showcase different features and frameworks and we need more examples.

Reasons why I wouldn't push to make this one the "default":

  1. Docs need to be reworked and I don't see that it would be worth the time.
  2. If I'm not using yolo, it doesn't show me anything about how to use dvclive. By default, I would rather have an example that invokes Live() or an explicit DVCLiveCallback().

Makes sense. For the record about 2, at least HuggingFace appears to be also defaulting to automatically include the callback if the library is installed, so assuming we start moving the callbacks to their repos, more examples might start to look like this

daavoo commented 11 months ago

Will make it another repo, converting to draft for now