msracver / Deformable-ConvNets

Deformable Convolutional Networks
MIT License
4.02k stars 953 forks source link

Exception while resuming training: assert isinstance(step, list) and len(step) >= 1 AssertionError #248

Open pervaizniazi opened 5 years ago

pervaizniazi commented 5 years ago

Hello, I need to resume training but getting following exception:

File "experiments/fpn/../../fpn/../lib/utils/lr_scheduler.py", line 29, in init assert isinstance(step, list) and len(step) >= 1 AssertionError

I have made following changes in .yaml file: begin_epoch: 76 end_epoch: 100

Any help will be much appreciated.

Thanks

bfialkoff commented 5 years ago

The issue begins in the lr_step field in the config file.

lr: 0.0005
  lr_step: '4.83'
  warmup: true
  warmup_lr: 0.00005
  # typically we will use 4000 warmup step for single GPU on VOC
  warmup_step: 1000

In the call to get the learning rate scheduler:

    # decide learning rate
    base_lr = lr
    lr_factor = config.TRAIN.lr_factor
    lr_epoch = [float(epoch) for epoch in lr_step.split(',')]
    lr_epoch_diff = [epoch - begin_epoch for epoch in lr_epoch if epoch > begin_epoch]
    lr = base_lr * (lr_factor ** (len(lr_epoch) - len(lr_epoch_diff)))
    lr_iters = [int(epoch * len(roidb) / batch_size) for epoch in lr_epoch_diff]
    print('lr', lr, 'lr_epoch_diff', lr_epoch_diff, 'lr_iters', lr_iters)
    lr_scheduler = WarmupMultiFactorScheduler(lr_iters, lr_factor, config.TRAIN.warmup, 
    config.TRAIN.warmup_lr, config.TRAIN.warmup_step)

Note that steps in your error call is lr_iters, if you follow the logic here you will see that lr_epoch=[4.83] and this means the lr_epoch_diff = [epoch - begin_epoch for epoch in lr_epoch if epoch > begin_epoch] is an empty list because the if will never be satisfied if begin_epoch > lr_step.

I don't have a fix for this. I'd be happy for more action here, its a pretty serious flaw.