yjxiong / temporal-segment-networks

Code & Models for Temporal Segment Networks (TSN) in ECCV 2016
BSD 2-Clause "Simplified" License
1.53k stars 477 forks source link

Finetuning kinetics pretrained on my custom dataset. #model not learning anything #201

Closed fmthoker closed 6 years ago

fmthoker commented 6 years ago

@yjxiong How do we run the model using pre-trained kinetics weights on our own dataset for finetuning? I followed the process presented in the wiki, However, the model is not learning anything. Dataset configuration: 37900 training videos and 18000 testing videos. Each video around 80 frames. I am using a single GPU for training.

Here are the training settings I use: Batch size: 16 net: "models/ntu/tsn_bn_inception_rgb_train_val.prototxt"

testing parameter

test_iter:950 test_interval: 500 test_initialization: true

output

display: 20 average_loss: 20

snapshot: 500

snapshot_prefix: "models/ntu_rgb_bn_inception" debug_info: false

learning rate

base_lr: 0.001 lr_policy: "step" gamma: 0.1 stepsize: 23000 max_iter: 70000 iter_size: 1

parameter of SGD

momentum: 0.9 weight_decay: 0.0005 clip_gradients: 40

GPU setting

solver_mode: GPU device_id: [0,1,2,3] richness: 200 ~ Here are the training logs:

I0531 22:54:42.976332 14962 solver.cpp:640] Iteration 32800, lr = 0.0001 I0531 22:54:53.158599 14962 solver.cpp:240] Iteration 32820, loss = 3.71449 I0531 22:54:53.158618 14962 solver.cpp:255] Train net output #0: loss = 3.93352 ( 1 = 3.93352 loss) I0531 22:54:53.158623 14962 solver.cpp:640] Iteration 32820, lr = 0.0001 I0531 22:55:02.475733 14962 solver.cpp:240] Iteration 32840, loss = 3.73273 I0531 22:55:02.475766 14962 solver.cpp:255] Train net output #0: loss = 3.33865 ( 1 = 3.33865 loss) I0531 22:55:02.475772 14962 solver.cpp:640] Iteration 32840, lr = 0.0001 I0531 22:55:16.432497 14962 solver.cpp:240] Iteration 32860, loss = 4.26432 I0531 22:55:16.432529 14962 solver.cpp:255] Train net output #0: loss = 3.89254 ( 1 = 3.89254 loss) I0531 22:55:16.432534 14962 solver.cpp:640] Iteration 32860, lr = 0.0001 I0531 22:55:27.215116 14962 solver.cpp:240] Iteration 32880, loss = 3.65911 I0531 22:55:27.215148 14962 solver.cpp:255] Train net output #0: loss = 3.59853 ( 1 = 3.59853 loss) I0531 22:55:27.215153 14962 solver.cpp:640] Iteration 32880, lr = 0.0001 I0531 22:55:37.358016 14962 solver.cpp:240] Iteration 32900, loss = 3.92511 I0531 22:55:37.358033 14962 solver.cpp:255] Train net output #0: loss = 3.6643 ( 1 = 3.6643 loss) I0531 22:55:37.358053 14962 solver.cpp:640] Iteration 32900, lr = 0.0001 I0531 22:55:48.257910 14962 solver.cpp:240] Iteration 32920, loss = 3.87675 I0531 22:55:48.258025 14962 solver.cpp:255] Train net output #0: loss = 3.76818 ( 1 = 3.76818 loss) I0531 22:55:48.258044 14962 solver.cpp:640] Iteration 32920, lr = 0.0001 I0531 22:56:01.281738 14962 solver.cpp:240] Iteration 32940, loss = 3.68521 I0531 22:56:01.281755 14962 solver.cpp:255] Train net output #0: loss = 3.88777 ( 1 = 3.88777 loss) I0531 22:56:01.281760 14962 solver.cpp:640] Iteration 32940, lr = 0.0001 I0531 22:56:11.044287 14962 solver.cpp:240] Iteration 32960, loss = 3.66288 I0531 22:56:11.044319 14962 solver.cpp:255] Train net output #0: loss = 3.50338 ( 1 = 3.50338 loss) I0531 22:56:11.044324 14962 solver.cpp:640] Iteration 32960, lr = 0.0001 I0531 22:56:20.603466 14962 solver.cpp:240] Iteration 32980, loss = 3.63414 I0531 22:56:20.603592 14962 solver.cpp:255] Train net output #0: loss = 3.61791 ( 1 = 3.61791 loss) I0531 22:56:20.603598 14962 solver.cpp:640] Iteration 32980, lr = 0.0001 I0531 22:56:34.572155 14962 solver.cpp:433] Iteration 33000, Testing net (#0) I0531 22:56:47.896167 14962 solver.cpp:490] Test net output #0: accuracy = 0.0663158 I0531 22:56:47.896200 14962 solver.cpp:490] Test net output #1: loss = 4.61347 ( 1 = 4.61347 loss) I0531 22:56:48.189831 14962 solver.cpp:240] Iteration 33000, loss = 4.04755 I0531 22:56:48.189862 14962 solver.cpp:255] Train net output #0: loss = 3.54686 ( 1 = 3.54686 loss) I0531 22:56:48.189867 14962 solver.cpp:640] Iteration 33000, lr = 0.0001 I0531 22:56:58.206900 14962 solver.cpp:240] Iteration 33020, loss = 3.67343 I0531 22:56:58.207010 14962 solver.cpp:255] Train net output #0: loss = 3.7331 ( 1 = 3.7331 loss) I0531 22:56:58.207031 14962 solver.cpp:640] Iteration 33020, lr = 0.0001 I0531 22:57:08.124691 14962 solver.cpp:240] Iteration 33040, loss = 3.67509 I0531 22:57:08.124722 14962 solver.cpp:255] Train net output #0: loss = 3.5837 ( 1 = 3.5837 loss) I0531 22:57:08.124727 14962 solver.cpp:640] Iteration 33040, lr = 0.0001 I0531 22:57:21.653374 14962 solver.cpp:240] Iteration 33060, loss = 3.71491 I0531 22:57:21.653407 14962 solver.cpp:255] Train net output #0: loss = 3.42221 ( 1 = 3.42221 loss) I0531 22:57:21.653411 14962 solver.cpp:640] Iteration 33060, lr = 0.0001

yjxiong commented 6 years ago

I am using a single GPU for training

Possibly the batchsize is too small. Not sure it will work in this situtation.

fmthoker commented 6 years ago

the problem was related to batch size and net: "models/ntu/tsn_bn_inception_rgb_train_val.prototxt"