How is the model registry in Studio supposed to be used?

haimat commented 1 year ago

In nearly all of our computer vision / deep learning projects we follow this overall workflow:

Create the training dataset
Train first version of the model (on local dev machine)
Fine-tune hyperparams & re-train model on server until we are happy
Save the best model for that project+dataset and deploy/send to customer
Later we might search for "the best" model in our local storage / file system / etc.

Now we want to use DVC and your related tools to support all these steps:

Manage even large datasets via DVC
Run experiments and train first model(s) via DVC
Full training and model fine-tuning on our server via CML
Register the best model in the Studio model registry
Find a certain model based on training parameters in the Studio model registry

Using DVC+CML for steps 1-3 works fine for us, now we are looking into steps 4 and 5, i.e, registering a model in Studio and finding it later again. Here are our thoughts and questions regarding the Studio model registry:

Usually we simply want something like "give me the latest or best version of the trained model for projects X, dataset Y." It seems, however, this is not possible in Studio that way, because one cannot download a model directly from the registry. I have fo find the corresponding Github tag and dvc get the model file for that commit. Seems a bit cumbersome to me.
- Would it be an option for you to integrate a download link into Studio, in order to directly download the model file?
- Alternatively or until then, could you at least provide the correct full dvc get command for a certain model version in Studio?
When I registered a model in Studio, I had to provide the storage location, i.e. path to the model file as well. However, when I now click on the model version 1.0.0 I get only this message "No path is set for this model version. You can add one for head-yolov8-ski-defects-1280 to this project's artifacts.yaml file." But in my artifacts.yaml file there is already the path to the model from when I registered it in the first place. Why is this path not shown for version 1.0.0 of my model?
The same goes for the model labels - why are none of my labels (that I set when I registered the model) shown on the verison 1.0.0 model page?
- And: How can I search for a certain model using these labels?
Also, what if I want to do a very simply select like "give me the model for project X, dataset Y, with the max. training parameter Z" - how can I achieve that?

All in all I am not sure whether I really got the way one is supposed to use the model registry in Studio, in particular in the context of my questions above. Any clarification on these questions and hints on how I can manage steps 4&5 above would be appreciated a lot :+1:

tapadipti commented 1 year ago

Thank-you @haimat for creating this issue and summarizing your workflow and requirements. Below I've answered your questions. In short, some of these are possible currently, and some need more work. Please reply with your follow up questions/comments if any.

directly download the model file .. & provide the correct full dvc get command

We will prioritize working on displaying the full dvc get command. Then, we will start work on the direct download.

Why is this path not shown for version 1.0.0 of my model?

Can you please check if that particular commit (the one for version 1.0.0) has artifacts.yaml and that the path is present there?

The same goes for the model labels

Same as above - Can you please check if that particular commit (the one for version 1.0.0) has artifacts.yaml and that the labels are present there?

How can I search for a certain model using these labels?

This is currently not possible. We will create an internal issue to work on this. Also, we currently don’t display labels in the dashboard; we’ll discuss this.

"give me the model for project X, dataset Y, with the max. training parameter Z"

This is possible from the project table. There, you can sort the table by the required parameter, and then for the desired commit, on mouse over on the model column, you will see a tooltip with a View in registry button to view the model in the model registry (see below). Clicking on this will lead you to the model details page for the required model version. Downloading the model file at this point is a separate issue.

how I can manage steps 4&5 above - Register the best model in the Studio model registry

For this, you can find the desired experiment in the project table. Once you have the commit hash for this, you can switch to the model registry and register the commit as a new model version (if you haven’t registered it already). You can also switch to the project table for a specific model version from the model details page (see below). And we are working on integrating project metrics and plots into the model details page.

how I can manage steps 4&5 above - Find a certain model based on training parameters in the Studio model registry

Currently, this is possible from the project table (as described above). And once we complete integrating project metrics and plots into the model details page, it will be somewhat possible from the model details page as well. But even after this integration, comparing multiple model versions in the model registry itself will still not be possible; if you have suggestions on how to best do this, please do share.

shcheklein commented 1 year ago

@haimat hey, what are thoughts about this? (clearly we need to prioritize certain things - esp an easier way to download a model), but does table solve the workflow issue for you? Any other thoughts on this?

haimat commented 1 year ago

Thanks, I can answer next week 👍

haimat commented 1 year ago

@tapadipti @shcheklein Sorry for the late response - here is my feedback:

We will prioritize working on displaying the full dvc get command. Then, we will start work on the direct download.

Sounds good, thanks. Do you have a rough estimation for when you plan to have both features ready?

Can you please check if that particular commit (the one for version 1.0.0) has artifacts.yaml and that the path is present there?

It seems I have used the wrong commit/experiment for registering verison 1.0.0. After choosing the correct commit/experiment, the model version details page looks much better now 👍

Same as above - Can you please check if that particular commit (the one for version 1.0.0) has artifacts.yaml and that the labels are present there?

Seems to be solved, see above.

This is currently not possible. We will create an internal issue to work on this. Also, we currently don’t display labels in the dashboard; we’ll discuss this.

Out of interest: If there is no option to filter/search for certain model tags, what was your reason to introduce them in the first place?

This is possible from the project table. There, you can sort the table by the required parameter, and then for the desired commit, on mouse over on the model column, you will see a tooltip with a View in registry button to view the model in the model registry (see below). Clicking on this will lead you to the model details page for the required model version. Downloading the model file at this point is a separate issue.

Thanks, that should work.

For this, you can find the desired experiment in the project table. Once you have the commit hash for this, you can switch to the model registry and register the commit as a new model version (if you haven’t registered it already). You can also switch to the project table for a specific model version from the model details page (see below). And we are working on integrating project metrics and plots into the model details page.

It seems metrics and plots are already part of the models details page, right?

tapadipti commented 1 year ago

@haimat Thank-you for your response. Good to know some of your issues are resolved 👍

Do you have a rough estimation for when you plan to have both features ready?

Displaying the full dvc get command is expected to complete in a week. Actually downloading the model file isn’t scheduled at the moment; we will let you know when we have details on its timeline. Can you pls indicate how urgent it is for you that Studio should support actual download (not just provide the dvc get command)?

If there is no option to filter/search for certain model tags, what was your reason to introduce them in the first place?

Labels were introduced to let users add descriptive metadata to the models. You can see the labels in the model details page, which is the page you would use to get complete details of the model including description, methods, requirements, etc. But I agree that being able to filter/search on the labels would be helpful. Can you pls indicate how important it is for your workflow that Studio allows to search/filter by labels?

It seems metrics and plots are already part of the models details page, right?

Yes. You can also select which metrics/plots you want to keep in the model details page by using the Configure button. We are working on making it possible to save your selections; this should be live in a few days.

haimat commented 1 year ago

Displaying the full dvc get command is expected to complete in a week. Actually downloading the model file isn’t scheduled at the moment; we will let you know when we have details on its timeline. Can you pls indicate how urgent it is for you that Studio should support actual download (not just provide the dvc get command)?

Good question ... I think if the full dvc get command is correct and easy to find and copy from the web site, than a direct download is not super urgent. It would then be a nice to have for me, but I could live with the dvc get command for a while.

Labels were introduced to let users add descriptive metadata to the models. You can see the labels in the model details page, which is the page you would use to get complete details of the model including description, methods, requirements, etc. But I agree that being able to filter/search on the labels would be helpful. Can you pls indicate how important it is for your workflow that Studio allows to search/filter by labels?

That would be a nice addition to Studio. It's probably not the most important feature to add, but it would make perfect sense to add it. I would say 5/10 on the importance scale :)

Yes. You can also select which metrics/plots you want to keep in the model details page by using the Configure button. We are working on making it possible to save your selections; this should be live in a few days.

Sounds great, thanks :+1:

shcheklein commented 1 year ago

Good discussion folks, closing this since I think it mostly resolved.

iterative / studio-support

How is the model registry in Studio supposed to be used? #80