stark-t / PAI

Pollination_Artificial_Intelligence
5 stars 1 forks source link

YOLOv7 saves a lots of weights and is confusing :D #38

Closed valentinitnelav closed 2 years ago

valentinitnelav commented 2 years ago

Hi @stark-t

Do you have an idea from where to stop the train.py to save all these bunch of weights? This eats up a lot of storage pretty fast as the big ones are about 0.5 Gb. YOLOv5 saves only a best and last by default. I am confused about so many "best" options :D

$ ls ~/PAI/detectors/yolov7/runs/train/yolov7_n6_b8_e300_hyp_p54/weights

best_202.pt  best_214.pt  best_238.pt  best_243.pt  best_263.pt  best_267.pt  best_271.pt   epoch_024.pt  epoch_124.pt  epoch_224.pt  epoch_296.pt  init.pt
best_207.pt  best_217.pt  best_239.pt  best_246.pt  best_264.pt  best_268.pt  best_273.pt   epoch_049.pt  epoch_149.pt  epoch_249.pt  epoch_297.pt  last.pt
best_212.pt  best_220.pt  best_240.pt  best_249.pt  best_265.pt  best_269.pt  best.pt       epoch_074.pt  epoch_174.pt  epoch_274.pt  epoch_298.pt
best_213.pt  best_236.pt  best_242.pt  best_250.pt  best_266.pt  best_270.pt  epoch_000.pt  epoch_099.pt  epoch_199.pt  epoch_295.pt  epoch_299.pt
valentinitnelav commented 2 years ago

The --nosave might solve this. For job 3207748, despite being terminated by the scheduler due to the set time limit of 50 hours, the weights folder had only 3 files and the model managed to run until epoch 300.

ls ~/PAI/detectors/yolov7/runs/train/yolov7_w6_b4_e300_hyp_custom/weights

# epoch_299.pt  init.pt  last.pt