valentinitnelav commented 2 years ago

Hi @stark-t , I run two identical nano models on the Clara cluster and the results are a bit different. Below you can look at the confusion matrices on the validation dataset. You also find the results.csv for each run at the bottom of this comment.

I personally do not like to see those differences in the two identical nano runs (but I can learn to accept it :D ). Not sure how to set a seed for yolov5 so that two runs of the same model are identical, or if that is even possible with the current configuration. Sadly we didn't see yet any parameters implemented with argparse to take a seed. There is a discussion here https://github.com/ultralytics/yolov5/issues/1222 pointing at PyTorch reproducibility issue https://pytorch.org/docs/stable/notes/randomness.html

The main takes are:

Completely reproducible results are not guaranteed across PyTorch releases, individual commits, or different platforms. Furthermore, results may not be reproducible between CPU and GPU executions, even when using identical seeds.

The only method I'm aware of that might guarantee you identical results might be to train on CPU with --workers 0, but this is impractical naturally, so you simply need to adapt your workflow to accommodate minor variations in final model results.

nano model n1

confusion_matrix

nano model n2

confusion_matrix

small model s

confusion_matrix

Results csv files

nano model n1: results.csv

nano model n2: results.csv

small model s: results.csv

valentinitnelav commented 1 year ago

FYI

Hi @stark-t , just realized that YOLOv5 made some updates (release v6.2) and one interesting aspect is the reproducibility with a --seed argument. I just read this:

Training Reproducibility: Single-GPU YOLOv5 training with torch>=1.12.0 is now fully reproducible, and a new --seed argument can be used (default seed=0) (https://github.com/ultralytics/yolov5/pull/8213 by @AyushExel).

https://github.com/ultralytics/yolov5/releases/tag/v6.2

I didn't check yet all the details, but this might not work for parallel GPUs as we use on the cluster. Sounds like it works only for a single GPU.

valentinitnelav commented 1 year ago

Hi @stark-t ,

On the project with Malika and run again into this reproducibility problem and not sure what I'm doing wrong and how to solve it :/ I opened an issue on the github repo of YOLOv7, here https://github.com/WongKinYiu/yolov7/issues/1144 Meanwhile, if you know what might cause this, please let me know.

If running two identical models on multiple GPUs is inherently not reproducible and the results can be so different, then I m not sure how to properly compare different model architectures or parameters.

FYI, I found these posts interesting to read:

Could machine learning fuel a reproducibility crisis in science?; Nature, 26 July 2022

Artificial intelligence faces reproducibility crisis; Science, 16 Feb 2018

valentinitnelav commented 1 year ago

This seems be an issue with detectron2 as well: https://github.com/facebookresearch/detectron2/issues/4260

valentinitnelav commented 1 year ago

Hi @stark-t ,

I run multiple tests and I discovered that with the release v.6.2 of YOLOv5 one can get reproducible results but only when using a single GPU. YOLOv7 has a reproducibility problem both when using a single GPU or multiple GPUs at once.

Since we compare different models some with YOLOv5 and some with YOLOv7 not sure how to have a fully reproducible comparison.

An alternative is to run say 5 times a model, so that we have 5 values for each metrics and then compare averages of these values. Does that make sense?
What do you think?

valentinitnelav commented 1 year ago

For the purpose of this paper, it would be too computationally expensive to train the models several times. I'll close this issue here. Hopefully YOLOv and v7 will allow reproducibility in the future when trained in parallel as well.

stark-t / PAI

Reproducibility of two YOLOv5 identical train jobs #31

nano model n1

nano model n2

small model s

Results csv files