Closed TaoXieSZ closed 4 years ago
@ChristopherSTAN can you point tensorboard to a google drive folder like you have? That would be really cool, then all of your work is saved and you can keep track of experiments this way.
This is a really good pro tip for Colab users! Maybe we should add a --log-dir
argument to train.py to enable this?
@glenn-jocher You remind me about that.
However, I have long time no looking the tensorBoard. I just try and it can only point to the data inside yolov5's folder.
For the argument problem, that's up to you, LOL. And I think it is more convenient for colab users if you do. There is --work-dir
argument in mmdetection. It ignites my idea about this and I find it can save checkpoints in larger google drive.
BTW, in my experience, using tensorboard often slows notebooks and raises disconnection (maybe Google try to avoid over-usage), so I ignore that.
It does work! Wow, so this is a backdoor to permanence with Colab. You can actually log all of your experiments straight to drive, and then pick up where you left off the next day without having to move any files. This is a real game changer for colab dev work. I'll add a PR for the argparser --logdir argument.
@glenn-jocher It is really amazing!
All done. Thanks for the great idea @ChristopherSTAN!
@glenn-jocher It is just kind of feedback from a deep-user. Expecting for better yolov5 in the future.
BTW, I noticed the default bbox loss is now CIoU, maybe you should update the logging entry. It may raise some confusion.
@ChristopherSTAN yes, you are correct, it's now CIoU. Yes I need to update the comment to a criterion-agnostic term like 'box' or 'regression'.
TODO: Update GIoU labels to criteria-agnostic terms.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@glenn-jocher Hello, I have been trying to train yolov5_v4 it seems that the train arguments have changed, before i used to use logdir and then when the training would stop ( because i work on colab) i would run it and it would have picked up from where it started but now, it doesnt! i even set the new weights but the training starts as if there has been no training before, the epoch number doesnt reset but all the map graphs show that the training has started from the beginning. What should I do?
here are my arguments :
!python train.py --img 320 --batch 128 --epochs 200 \ --data /content/YoloV5Data/data.yaml \ --cfg ./models/yolov5s.yaml \ --weights /content/drive/Yolov5S_320/exp5/weights/last.pt\
--project /content/drive/Yolov5S_320/
@maheeetaaa yes local directly logging structure was unified in https://github.com/ultralytics/yolov5/pull/1377. Training results are saved to runs/train/exp.
You may resume an interrupted training run very simply:
python train.py --resume # automatically select most recent run
python train.py --resume path/to/last.pt # manually specify run to resume
🚀 Feature
It will be more convenient for Colab user to save checkpoints in Google Drive than in yolov5/runs.
My idea
Just change in nearly line 458 to 464 (in my current version):
if not opt.evolve: tb_writer = None if opt.local_rank in [-1, 0]: print('Start Tensorboard with "tensorboard --logdir=runs", view at http://localhost:6006/') # Change the path here tb_writer = SummaryWriter(log_dir=increment_dir('/content/drive/My Drive/yolov5-checkpoints/exp', opt.name)) train(hyp, opt, device, tb_writer)
@TaoXieSZ for me as fresh in the subject, I don't understand if your proposed change in lines 458 to 464 is inside model yaml file or another file? Could you please help me?
@Leprechault runs can be logged anywhere now, so @TaoXieSZ comment is no longer applicable. To long a run to any directory use the --project argument along with the --name argument:
python train.py --project runs/train --name exp
Thanks very much @glenn-jocher !!!!
🚀 Feature
It will be more convenient for Colab user to save checkpoints in Google Drive than in yolov5/runs.
My idea
Just change in nearly line 458 to 464 (in my current version):