sanghoon / pva-faster-rcnn

Demo code for PVANet
https://arxiv.org/abs/1611.08588
Other
651 stars 241 forks source link

Loss plateau detection #7

Closed dereyly closed 7 years ago

dereyly commented 7 years ago

Hello! Sorry if my question is off-top. Many interesting things i think about when read your paper. One of this thing how to train own architecture from scratch And idea to detect plateau is good. And you write that gives significant improvement. I analysis my graphics of accuracy and i want to try create "plateau detector". I know that you use inhouse library. But maybe you give some advice how to create it in Caffe:

  1. What interval (period, number of iteration) is good to analyze?
  2. Train loss analyze or some constant validation set part?
  3. What step policy is prefer -- 0.1 step, 0.2 step or 0.5 step?
zimenglan-sysu-512 commented 7 years ago

hi @dereyly can you share some experiences about how to set the plateau configurations? thanks.

xiaoxiongli commented 7 years ago

@dereyly How can i train in plateau mode?

sanghoon commented 7 years ago

Hi, The caffe submodule in this repository contains a source of a plateau detector. For the usages, please refer to https://github.com/BVLC/caffe/pull/4606

Let me answer @dereyly 's question one by one

  1. Detection windows start from 20000 and are doubled every x/10 LR.
  2. For simplicity, we monitor training loss only.
  3. If you're talking about 'gamma' (= LR decay ratio), we've used 0.1 and 0.3165 (which is sqrt of 1/10)
xiaoxiongli commented 7 years ago

@sanghoon Dear sanghoon, I wonder why you set the "plateau_winsize" are doubled every x/10 LR ?

plateau_winsize: 10000 suppose that here lr=0.01 plateau_winsize: 20000 0.001 plateau_winsize: 40000 0.0001 plateau_winsize: 80000 0.00001

another confuse: if plateau_winsize is 40000, does it means that we at least do 40000 iterators training(using the same lr 0.0001) regardless of the training loss "decrease/not decrease"?

dereyly commented 7 years ago

@sanghoon Thank you! Plateau detector form caffe-fast-rcnn seems good enough @zimenglan-sysu-512 , @xiaoxiongli lr_policy: "plateau" from caffe-fast-rcnn better and simpler then my python layer that decrease gradient, it decrease bottom after loss function. I try: lr_policy: "plateau" gamma: 0.33 plateau_winsize: 10000 plateau_winsize: 20000 plateau_winsize: 20000 plateau_winsize: 20000 plateau_winsize: 20000

but learning process not finished now

sanghoon commented 7 years ago

Hi @xiaoxiongli, As loss converges, it seemed for me that fluctuations covers the slight improvements in training loss. That's why I wanted to set the increase window size as training continues.

Answering your second question. that's true. For example, at least 40k iterations of the training will be doen with lr=0.0001.

zimenglan-sysu-512 commented 7 years ago

hi @dereyly have you finished your training? what about the performance? hi @sanghoon i found the using plateau lr policy need very large iterations (e.g. > 30w iterations) and the plateau_winsize variable will be increased and the first one to be 4w will be better.

dereyly commented 7 years ago

@zimenglan-sysu-512 Yes one of my experiment is finished, I lost about 0.5% acc in plateau mode vs schedule mod. Maybe I do some other experiments with smaller window size or continue experiments with validation loss. Now I have broken logs and hard to compare models throw iterations (

xiaoxiongli commented 7 years ago

@sanghoon thank you very much! I got it~^_^

@zimenglan-sysu-512 @dereyly when i use plateau training mode(VOC2007), the mAP is 0.6983, when i do not use this mode, the mAP is about 0.714.

totally iterators = 20w.

train_net: "models/pvanet/example_train_384/train.prototxt"

base_lr: 0.001 lr_policy: "plateau" gamma: 0.1 plateau_winsize: 10000 plateau_winsize: 20000 plateau_winsize: 40000 plateau_winsize: 80000

display: 20 average_loss: 100 momentum: 0.9 weight_decay: 0.0002

We disable standard caffe solver snapshotting and implement our own snapshot

function

snapshot: 0

We still use the snapshot prefix, though

snapshot_prefix: "pvanet_frcnn_384" iter_size: 2

sanghoon commented 7 years ago

Hi all, I'd like to share how the network is trained. I hope this helps you.

One more thing... I've found there is a bug in the existing py-faster-rcnn code related to 'average_loss' (the function doesn't work with the current codes)

If you want to train a network with 'plateau', please checkout 'develop' branch which contains a hotfix for the issue.

ImageNet pre-training (1000 class)

COCO + VOC2007 + VOC2012 (80 class)

iter_size: 3
base_lr: 0.003
gamma: 0.3165
plateau_window:  50000    # 0.003165
plateau_window:  70700    # 0.001
plateau_window: 100000    # 0.0003165
# No significant improvement after this
plateau_window: 141400    # 0.0001
plateau_window: 200000    # 0.0000317

# Expected number of iterations: 1.2~2M

VOC2007 + VOC2012 (20 class)

iter_size: 3
base_lr: 0.001
gamma: 0.1
plateau_window: 50000     # 0.001
# No significant improvement after this
plateau_window: 100000    # 0.0001

# Expected number of iterations: 0.5~1M
Po-Hsuan-Huang commented 7 years ago

@sanghoon
Dear sanghoon, Thanks for the good work. I am trying to use pvanet to detect other classes. Do you also suggest use plateau for fine-tuning ? or Adam is more preferable according to your experience ?

Thank you.