YOLOv5 releases during our project - how to manage reproducibility with fast changing repositories?

valentinitnelav commented 1 year ago

Hi @stark-t ,

I just noticed that YOLOv5 has a new release v6.2. They introduced Classification Models on top of the usual object detection model. I am not 100% sure, but it looks like they dropped the COCO pretrained weights and switched to using ImageNet dataset. YOLOv7 v0.1 uses the COCO dataset.

Anyways, so far I made use of the v6.1 weights for YOLOv5, but when I deployed the YOLOv5 repository on the cluster I just cloned the entire repo at that time without specifying a certain release tag.

I just did (~ 3 months ago):

cd ~/PAI/detectors/
git clone https://github.com/ultralytics/yolov5

same for YOLOv7

Should I use that particular branch that corresponds to the v6.1 release or git clone a specific release (v6.1 for YOLOv5 and v0.1 for YOLOv7) and re-run all the models?

I hesitate to do this because it will take again time (+ the cluster will put me in a long queue with such a big request; 2-4 days in total I guess), but I think this is a more reproducible way. Also, I am not sure if this is the most reproducible way, because even if I checkout on a particular branch, that branch can still be updated with commits and remains behind the version that we have locally. I realize that I am not sure how GitHub versioning works when it comes to having something fully reproducible. Perhaps one needs to checkout at a particular date, like in this example? I am actually curious about how this works (for this or other projects). If you know more about this, can you let me know?

Perhaps we can just mention in a comparison table "v6.1" for YOLOv5 models and "v0.1" for YOLOv7 ones and that is also ok from my side.

What do you think?

stark-t commented 1 year ago

@valentinitnelav Without looking to deep into this topic, I think we should use the results that we already have (expect yolov7tiny). It would be beneficial if we compare all models that use the same pretrained weights (COCO) and not from different datasets.

valentinitnelav commented 1 year ago

Yes, I agree, we should use the same COCO pre-trained weights across the models. My worry is more related to how to have reproducibility considering that the GitHub repositories of YOLOv5 & v7 change almost daily.

valentinitnelav commented 1 year ago

The options that I see for this (at the moment of my present narrow knowledge) are:

clone a repository, then checkout on a branch that corresponds to a public release. StackOverflow (SO) example/discussion here. For example for YOLOv5, the branch v6.1 (because we use the v6.1 weights as well). But even so, we do not guarantee full reproducibility because that branch can have new commits meanwhile
checkout at a particular date, like in this SO discussion?

stark-t / PAI

YOLOv5 releases during our project - how to manage reproducibility with fast changing repositories? #57