"Error: root: Registry not supported" on Alvis when attempting to download baseline model for training new models

Bergylta commented 1 week ago

🐛 Bug

Not able to download the baseline models when trying to create new models in the notebook Train_models on Alvis

To Reproduce (REQUIRED)

Input: Yaml-file directory: /mimer/NOBACKUP/groups/snic2021-6-9/tmp_dir/KSO_xhoni_frames_tjorn/ Experiment name: KSO_registry_test_1 Model download directory: /mimer/NOBACKUP/groups/snic2021-6-9/tmp_dir/emil_models_for_now/

weights = mlp.choose_baseline_model(download_folder.selected)

Output:

ERROR:root:Registry not supported.

Expected behavior

There should be a dropdown menu with the available baseline models such as "object detection" or "segmentation" (we are aiming for a segmentation model right now)

Additional context

Have not yet tested it on Cloudina

Diewertje11 commented 1 day ago

Hej! I have looked into the issue that you are describing. The error message you are getting is the error message we have written ourselves (https://github.com/ocean-data-factory-sweden/kso/blob/ec0c202beadfea22f438660bb8c6cc0e54389903/kso_utils/project.py#L1607) , and it occurs when there is no registry specified to get the models from. We support 'wandb' and 'mlflow' as registries.

When the MLProjectProcessor is initialized, the registry is set to None (https://github.com/ocean-data-factory-sweden/kso/blob/ec0c202beadfea22f438660bb8c6cc0e54389903/kso_utils/project.py#L1335) and that is why you run into this problem.

From which registry do you want to select a baseline model? To work around it without me actually fixing the issue yet, you can add to the cell above: mlp.registry = "wandb" or mlp.registry = "mlflow" depending on which one you want to use. This should fix your problem.

I need a bit more investigation to find when this has changed in the code behind the notebook and to actually fix it, because with the current code it should indeed just not work. So something has gone wrong in a commit in the past. But by setting this yourself, it should work again at least!

Let me know if you run into any more problems.

Diewertje11 commented 1 day ago

A note for myself: Here we see that the default before was 'wandb', and in this commit it is set to None.

https://github.com/ocean-data-factory-sweden/kso/commit/8862c10b65a0b140ab3ae96c6adba57172cad454

In that commit it looks like there is code added that can deal with other registries. But this code is deleted somewhere afterwards.

Diewertje11 commented 6 hours ago

I now see that this extra code that can deal with other registries is for the function choose_model, which is used in Evaluate_models.ipynb, Publish_models.ipynb and Publish_observations.ipynb. This was not for the choose_baseline_model that we are looking at now. So there is no code deleted afterwards, I just missed that we were in a different part of the project.py file.

So this commit I mentioned above (https://github.com/ocean-data-factory-sweden/kso/commit/8862c10b65a0b140ab3ae96c6adba57172cad454) breaks the choose_baseline_model function, since that one assumed the default of 'wandb' as registry.

@Bergylta, what would be the way you want to work with this, should the user set the registry he/she wants to use in the jupyter notebook? Since it can be both 'wandb' or 'mlflow' right? Or should it always only take baseline models from 'wandb'?

ocean-data-factory-sweden / kso