ocean-data-factory-sweden / kso

Notebooks to upload/download marine footage, connect to a citizen science project, train machine learning models and publish marine biological observations.
GNU General Public License v3.0
4 stars 12 forks source link

Cloudina notebook issues #375

Closed jannesgg closed 3 months ago

jannesgg commented 4 months ago

Me and Altea have been trying to run the tutorial 5 in Cloudina but the code appears to get stuck when downloading the CSV files necessary for populating the local database, it happens on both of our laptops. Do you happen to know what the issue might be and how we can fix it?

Additionally, we tried training yolo5m in Google Collab but there appears to be an issue with PosixPath. I would appreciate any help with it as well.

Lastly, during the meeting you have mentioned that yolov8 is available for training in tutorial 5 but the dropdown for choosing the baseline model only lists yolo5 models. The only way I got it to download yolo8 was by hard coding the name into train_yolo method, is there something we are missing?

Image Image

Image

jannesgg commented 4 months ago

@AlteaF @Seqrous It is easier to keep issues tracked here, so I recommend adding issues here under Notebook Issues in future.

The CSV issue is one we have had before, but it should be fixed in the latest version of kso. Please ensure you are working on top of the latest commit in dev branch.

jannesgg commented 3 months ago

@AlteaF @Seqrous As for the yolov5 base models, it seems to be related to using Weights and Biases, which is not strictly necessary in Cloudina. To use MLFLow instead, you could set pp.registry = "mlflow", which should give you other options for base models. By default, when no other model is available, the train_yolo function uses yolov8m.

jannesgg commented 3 months ago

Additional question from Altea: We were hoping you could clarify how we are supposed to use Cloudina to run model training. The current issue is that the server stops after some time while we are waiting for it to train the model and we have to restart it. It happened three times already this morning, while the computer was open and I was looking at other things. I feel like we are missing something on how to have it run on the background so that it can train, so could you clarify on if there is anything magical we should do about it? We are also confused on if the run should appear on weights and biases, while its training, or if it shouldn’t, as currently it doesn’t.

victor-wildlife commented 3 months ago

@jannesgg @Bergylta how do you train the models in Cloudina without having connection issues?

Seqrous commented 3 months ago

Been trying to run the notebook on Cloudina myself now. While the kernel is not dying when I run mlp.train_yolo it also does not make any progress. The only output I get in 20+ minutes of running are MLFlow warnings and that's about it.

jannesgg commented 3 months ago

Temporary solution to switch to WandB for now.

Seqrous commented 3 months ago

Regarding the "PosixtPath(".") has an empty name" error. This is due to incorrectly misconfigured or missing paths in train.txt/test.txt. When uploading images/labels to Cloudina you have to make sure that all files get uploaded (in our case, it was only uploading half of them) and that the paths in train.txt/valid.txt follow the Linux path format (forward slash instead of backward slash).

Seqrous commented 3 months ago

When training Yolov8 model, and registry set to wandb, I get the following error. @jannesgg any clue if this is related to that mlflow timeout problem? image

victor-wildlife commented 3 months ago

@Seqrous do you think the PosixtPath(".") issue might have something to do with the mlflow problem?

Seqrous commented 3 months ago

I doubt it and in case it wasn't clear enough, the issue is now resolved while the mlflow timeout still occurs.