ocean-data-factory-sweden / kso

Notebooks to upload/download marine footage, connect to a citizen science project, train machine learning models and publish marine biological observations.
GNU General Public License v3.0
4 stars 12 forks source link

Enable fish id models in Spyfish #400

Open victor-wildlife opened 1 month ago

victor-wildlife commented 1 month ago

Altea and Krzysztof developed a couple of fish id models but currently there is no way to run them in Spyfish (tutorial #9).

We should publish a model in Zenodo and then be able to download/retrieve it

victor-wildlife commented 1 month ago

@jannesgg do you have any suggestions to best move forward with this?

jannesgg commented 1 month ago

@victor-wildlife If the models have been trained in either the Koster team MLFlow / WandB, they should be accessible via Tutorial 9. If they have been trained locally, I have linked it now such that if a custom_project path is specified, it will look inside that project for any public artifacts, e.g. krzysiek-ziach/YOLOv8.

model = mlp.choose_model(custom_project="adi-ohad-heb-uni/project-wildlife-ai")

Hope this helps :)

victor-wildlife commented 1 month ago

thanks @jannesgg I have managed to run a couple of the models "yolov8n_e50" and "yolov8l_e50" but I couldn't get the labels for the predictions. There were some issues retrieving the mlflow artifact. I have created a new branch (labels_mlflow) to try to solve this issue . So far I just printed an error if the mlflow artifact doesn't have labels. Any thoughts on how to get around it?

jannesgg commented 1 month ago

@victor-wildlife I've had a look at the artifacts logged by those runs and it seems that the "input_datasets" folder is missing for all of Krzysztof's runs. This is probably why the yaml file cannot be found. I'm not sure why this happened but could be due to how they ran the training of the model? All other models we have trained seem to have these artifacts, including the latest segmentation ones, so I doubt it is due to changes in our code. Any ideas on how this could have happened?

victor-wildlife commented 1 month ago

@AlteaF or @Seqrous do you have any suggestion on why the yaml can't be found on mlflow? Maybe because you train the models locally?

jannesgg commented 1 month ago

@AlteaF , @Seqrous It could be particularly useful to look at the following part of Tutorial 5:

Fix important paths

mlp.setup_paths()

After this step, printing mlp.data_path should point to the directory which contains the input information (including the yaml). This could be one of the things to check to see why it is not being added.

Seqrous commented 1 month ago

Which yaml file if I may ask? There are two that are part of the training data. Is it one of those? Or is it some other yaml file that gets generated after training?

jannesgg commented 1 month ago

@Seqrous It is the yaml file which is located in the same folder as the train.txt, valid.txt etc. when preparing the dataset before training. This contains the data paths etc that are then loaded in as variables when running Tutorial 5.

AlteaF commented 1 month ago

We have used wandb to track our experiments, so we do not have anything on mlFlow, if that might help. We did the training of tutorial 5 from cloudina, but we used the .yaml file that as provided when we got the pictures. if necessary I can send that via email?

jannesgg commented 1 month ago

@AlteaF Yes sending this yaml file via e-mail might be good so I can inspect the paths. Thank you. :)

jannesgg commented 1 month ago

@AlteaF @Seqrous Also, could you send links to the runs on WandB as I could not find any completed runs with artifacts under WandB for your project.

victor-wildlife commented 3 weeks ago

@AlteaF @Seqrous Did you have the chance to email the yaml file and share the runs on W&B with Jannes? Let us know if there are any issues or something I can help with

AlteaF commented 3 weeks ago

@victor-wildlife I sent the yaml file to Jannes. In relation to the runs on W&B, we cannot seem to be able to make it public, so that even I can see it, it's Chris' account. Could you suggest a guide to make the runs public? should we send the report otherwise? thank you

jannesgg commented 2 weeks ago

@AlteaF It's not possible to make individual runs public, but Chris should be able to make the project public if that would be okay.

image