ijkguo / mx-rcnn

Parallel Faster R-CNN implementation with MXNet.
Other
669 stars 292 forks source link

How to resume training with a specific epoch when running train_end2end.py? #57

Closed breeze5428 closed 7 years ago

breeze5428 commented 7 years ago

Hi guys!

I am rookie. I have trained the resnet model using _trainend2end.py with 10 epochs successfully. Now, I want to resume training from the check points, for example continuing training from the 10th epoch. How to do it?

Sorry to trouble you. Thank you in advance!

ijkguo commented 7 years ago

--resume --begin_epoch 10 --end_epoch 15 for 5 more.

breeze5428 commented 7 years ago

Thank you for reply. I ran

python train_end2end.py --network resnet --gpu 0 --resume --begin_epoch 10 --end_epoch 15

,then an error reported as follow:

File "train_end2end.py", line 182, in main lr=args.lr, lr_step=args.lr_step) File "train_end2end.py", line 131, in train_net lr_scheduler = mx.lr_scheduler.MultiFactorScheduler(lr_iters, lr_factor) File "/home/weiliu/mxnet/python/mxnet/lr_scheduler.py", line 102, in init assert isinstance(step, list) and len(step) >= 1 AssertionError

However, It is normal to run python train_end2end.py --network resnet --gpu 0 --end_epoch 10

ijkguo commented 7 years ago

Add --lr_step 15, although epoch 15 will be the end. :)

breeze5428 commented 7 years ago

Ok, now it works well. Thank you sincerely!