iterative / studio-support

❓ DVC Studio Issues, Question, and Discussions
https://studio.iterative.ai
16 stars 1 forks source link

How is the model registry in Studio supposed to be used? #80

Closed haimat closed 1 year ago

haimat commented 1 year ago

In nearly all of our computer vision / deep learning projects we follow this overall workflow:

  1. Create the training dataset
  2. Train first version of the model (on local dev machine)
  3. Fine-tune hyperparams & re-train model on server until we are happy
  4. Save the best model for that project+dataset and deploy/send to customer
  5. Later we might search for "the best" model in our local storage / file system / etc.

Now we want to use DVC and your related tools to support all these steps:

  1. Manage even large datasets via DVC
  2. Run experiments and train first model(s) via DVC
  3. Full training and model fine-tuning on our server via CML
  4. Register the best model in the Studio model registry
  5. Find a certain model based on training parameters in the Studio model registry

Using DVC+CML for steps 1-3 works fine for us, now we are looking into steps 4 and 5, i.e, registering a model in Studio and finding it later again. Here are our thoughts and questions regarding the Studio model registry:

All in all I am not sure whether I really got the way one is supposed to use the model registry in Studio, in particular in the context of my questions above. Any clarification on these questions and hints on how I can manage steps 4&5 above would be appreciated a lot :+1:

tapadipti commented 1 year ago

Thank-you @haimat for creating this issue and summarizing your workflow and requirements. Below I've answered your questions. In short, some of these are possible currently, and some need more work. Please reply with your follow up questions/comments if any.

directly download the model file .. & provide the correct full dvc get command

We will prioritize working on displaying the full dvc get command. Then, we will start work on the direct download.

Why is this path not shown for version 1.0.0 of my model?

Can you please check if that particular commit (the one for version 1.0.0) has artifacts.yaml and that the path is present there?

The same goes for the model labels

Same as above - Can you please check if that particular commit (the one for version 1.0.0) has artifacts.yaml and that the labels are present there?

How can I search for a certain model using these labels?

This is currently not possible. We will create an internal issue to work on this. Also, we currently don’t display labels in the dashboard; we’ll discuss this.

"give me the model for project X, dataset Y, with the max. training parameter Z"

This is possible from the project table. There, you can sort the table by the required parameter, and then for the desired commit, on mouse over on the model column, you will see a tooltip with a View in registry button to view the model in the model registry (see below). Clicking on this will lead you to the model details page for the required model version. Downloading the model file at this point is a separate issue.

Screen Shot 2023-03-14 at 10 40 52

how I can manage steps 4&5 above - Register the best model in the Studio model registry

For this, you can find the desired experiment in the project table. Once you have the commit hash for this, you can switch to the model registry and register the commit as a new model version (if you haven’t registered it already). You can also switch to the project table for a specific model version from the model details page (see below). And we are working on integrating project metrics and plots into the model details page.

Screen Shot 2023-03-14 at 10 49 10

how I can manage steps 4&5 above - Find a certain model based on training parameters in the Studio model registry

Currently, this is possible from the project table (as described above). And once we complete integrating project metrics and plots into the model details page, it will be somewhat possible from the model details page as well. But even after this integration, comparing multiple model versions in the model registry itself will still not be possible; if you have suggestions on how to best do this, please do share.

shcheklein commented 1 year ago

@haimat hey, what are thoughts about this? (clearly we need to prioritize certain things - esp an easier way to download a model), but does table solve the workflow issue for you? Any other thoughts on this?

haimat commented 1 year ago

Thanks, I can answer next week 👍

haimat commented 1 year ago

@tapadipti @shcheklein Sorry for the late response - here is my feedback:

We will prioritize working on displaying the full dvc get command. Then, we will start work on the direct download.

Sounds good, thanks. Do you have a rough estimation for when you plan to have both features ready?

Can you please check if that particular commit (the one for version 1.0.0) has artifacts.yaml and that the path is present there?

It seems I have used the wrong commit/experiment for registering verison 1.0.0. After choosing the correct commit/experiment, the model version details page looks much better now 👍

Same as above - Can you please check if that particular commit (the one for version 1.0.0) has artifacts.yaml and that the labels are present there?

Seems to be solved, see above.

This is currently not possible. We will create an internal issue to work on this. Also, we currently don’t display labels in the dashboard; we’ll discuss this.

Out of interest: If there is no option to filter/search for certain model tags, what was your reason to introduce them in the first place?

This is possible from the project table. There, you can sort the table by the required parameter, and then for the desired commit, on mouse over on the model column, you will see a tooltip with a View in registry button to view the model in the model registry (see below). Clicking on this will lead you to the model details page for the required model version. Downloading the model file at this point is a separate issue.

Thanks, that should work.

For this, you can find the desired experiment in the project table. Once you have the commit hash for this, you can switch to the model registry and register the commit as a new model version (if you haven’t registered it already). You can also switch to the project table for a specific model version from the model details page (see below). And we are working on integrating project metrics and plots into the model details page.

It seems metrics and plots are already part of the models details page, right?

tapadipti commented 1 year ago

@haimat Thank-you for your response. Good to know some of your issues are resolved 👍

Do you have a rough estimation for when you plan to have both features ready?

Displaying the full dvc get command is expected to complete in a week. Actually downloading the model file isn’t scheduled at the moment; we will let you know when we have details on its timeline. Can you pls indicate how urgent it is for you that Studio should support actual download (not just provide the dvc get command)?

If there is no option to filter/search for certain model tags, what was your reason to introduce them in the first place?

Labels were introduced to let users add descriptive metadata to the models. You can see the labels in the model details page, which is the page you would use to get complete details of the model including description, methods, requirements, etc. But I agree that being able to filter/search on the labels would be helpful. Can you pls indicate how important it is for your workflow that Studio allows to search/filter by labels?

It seems metrics and plots are already part of the models details page, right?

Yes. You can also select which metrics/plots you want to keep in the model details page by using the Configure button. We are working on making it possible to save your selections; this should be live in a few days.

haimat commented 1 year ago

Displaying the full dvc get command is expected to complete in a week. Actually downloading the model file isn’t scheduled at the moment; we will let you know when we have details on its timeline. Can you pls indicate how urgent it is for you that Studio should support actual download (not just provide the dvc get command)?

Good question ... I think if the full dvc get command is correct and easy to find and copy from the web site, than a direct download is not super urgent. It would then be a nice to have for me, but I could live with the dvc get command for a while.

Labels were introduced to let users add descriptive metadata to the models. You can see the labels in the model details page, which is the page you would use to get complete details of the model including description, methods, requirements, etc. But I agree that being able to filter/search on the labels would be helpful. Can you pls indicate how important it is for your workflow that Studio allows to search/filter by labels?

That would be a nice addition to Studio. It's probably not the most important feature to add, but it would make perfect sense to add it. I would say 5/10 on the importance scale :)

Yes. You can also select which metrics/plots you want to keep in the model details page by using the Configure button. We are working on making it possible to save your selections; this should be live in a few days.

Sounds great, thanks :+1:

shcheklein commented 1 year ago

Good discussion folks, closing this since I think it mostly resolved.