tusen-ai / simpledet

A Simple and Versatile Framework for Object Detection and Instance Recognition
Apache License 2.0
3.08k stars 488 forks source link

KeyError for custom data training from scratch #235

Closed tairen99 closed 5 years ago

tairen99 commented 5 years ago

Hi, Thank you for your good work. I am trying to deploy your trident framework for my own dataset. I follow your "setup script" to install the simpledet, and I modified the configuration file " tridentnet_r101v2c4_c5_2x.py". Basically, I change the number of classes, the number of gpu, and the dataset. Then, I start my training by running: $ python3 detection_train.py --config config/tridentnet_r101v2c4_c5_2x.py But I get following error:

File "detection_train.py", line 139, in train_net sym, arg_params, aux_params = merge_bn(sym, arg_params, aux_params) File "/home/simpledet/utils/graph_optimize.py", line 77, in merge_bn gamma = mx.sym.var(node_name + "_gamma", shape=args[node_name + "_gamma"].shape) KeyError: 'bn0_gamma' image

However, after I read this https://github.com/TuSimple/simpledet/issues/186 and comment two lines as the same as Issue 186, I can start my training.

But I do not know why I need to do that? I think I follow the setup scripts and I should install the newest version of the simpledet. Meanwhile, do we have side effect if I comment these two lines during the training? Thank you again!

RogerChern commented 5 years ago

The checkpoint is trained with a early version of simpledet which does not employ any computation graph optimization.

On Sun, Sep 15, 2019 at 5:50 AM Tairen Chen notifications@github.com wrote:

Hi, Thank you for your good work. I am trying to deploy your trident framework for my own dataset. I follow your "setup script" to install the simpledet, and I modified the configuration file " tridentnet_r101v2c4_c5_2x.py". Basically, I change the number of classes, the number of gpu, and the dataset. Then, I start my training by running: $ python3 detection_train.py --config config/tridentnet_r101v2c4_c5_2x.py But I get following error:

File "detection_train.py", line 139, in train_net sym, arg_params, aux_params = merge_bn(sym, arg_params, aux_params) File "/home/simpledet/utils/graph_optimize.py", line 77, in merge_bn gamma = mx.sym.var(node_name + "_gamma", shape=args[node_name + "_gamma"].shape) KeyError: 'bn0_gamma' [image: image] https://user-images.githubusercontent.com/32938376/64913924-e9420a00-d6fd-11e9-95ac-88909378124e.png

However, after I read this https://github.com/TuSimple/simpledet/issues/186 http://url and comment two lines as the same as Issue 186, I can start my training.

But I do not know why I need to do that? I think I follow the setup scripts and I should install the newest version of the simpledet. Meanwhile, do we have side effect if I comment these two lines during the training? Thank you again!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/TuSimple/simpledet/issues/235?email_source=notifications&email_token=ABGODH4XWU2TLDFKMRQ4RJDQJVMAXA5CNFSM4IWYLI22YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HLMZ26A, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGODH6F4QVWLZS3HGSGET3QJVMAXANCNFSM4IWYLI2Q .

tairen99 commented 5 years ago

Thank you for your reply. In the " tridentnet_r101v2c4_c5_2x.py" I set the " from_scratch = True ", so I do not load the trained checkpoint (i.e.: not fine-tune from the existed checkpoint). Do we have any other reason for this KeyError? Thanks.

The checkpoint is trained with a early version of simpledet which does not employ any computation graph optimization. On Sun, Sep 15, 2019 at 5:50 AM Tairen Chen @.***> wrote: Hi, Thank you for your good work. I am trying to deploy your trident framework for my own dataset. I follow your "setup script" to install the simpledet, and I modified the configuration file " tridentnet_r101v2c4_c5_2x.py". Basically, I change the number of classes, the number of gpu, and the dataset. Then, I start my training by running: $ python3 detection_train.py --config config/tridentnet_r101v2c4_c5_2x.py But I get following error: File "detection_train.py", line 139, in train_net sym, arg_params, aux_params = merge_bn(sym, arg_params, aux_params) File "/home/simpledet/utils/graph_optimize.py", line 77, in merge_bn gamma = mx.sym.var(node_name + "_gamma", shape=args[node_name + "_gamma"].shape) KeyError: 'bn0_gamma' [image: image] https://user-images.githubusercontent.com/32938376/64913924-e9420a00-d6fd-11e9-95ac-88909378124e.png However, after I read this #186 http://url and comment two lines as the same as Issue 186, I can start my training. But I do not know why I need to do that? I think I follow the setup scripts and I should install the newest version of the simpledet. Meanwhile, do we have side effect if I comment these two lines during the training? Thank you again! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#235?email_source=notifications&email_token=ABGODH4XWU2TLDFKMRQ4RJDQJVMAXA5CNFSM4IWYLI22YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HLMZ26A>, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGODH6F4QVWLZS3HGSGET3QJVMAXANCNFSM4IWYLI2Q .

RogerChern commented 5 years ago

I see, the merge bn is currently designed for the fixbn setting only. It does not make sense for bn folding if you are training from scratch. We will update the train script later.

On Mon, Sep 16, 2019 at 12:43 PM Tairen Chen notifications@github.com wrote:

The checkpoint is trained with a early version of simpledet which does not employ any computation graph optimization. … <#m-4924169821271173032> On Sun, Sep 15, 2019 at 5:50 AM Tairen Chen @.***> wrote: Hi, Thank you for your good work. I am trying to deploy your trident framework for my own dataset. I follow your "setup script" to install the simpledet, and I modified the configuration file " tridentnet_r101v2c4_c5_2x.py". Basically, I change the number of classes, the number of gpu, and the dataset. Then, I start my training by running: $ python3 detection_train.py --config config/tridentnet_r101v2c4_c5_2x.py But I get following error: File "detection_train.py", line 139, in train_net sym, arg_params, aux_params = merge_bn(sym, arg_params, aux_params) File "/home/simpledet/utils/graph_optimize.py", line 77, in merge_bn gamma = mx.sym.var(node_name + "_gamma", shape=args[node_name + "_gamma"].shape) KeyError: 'bn0_gamma' [image: image] https://user-images.githubusercontent.com/32938376/64913924-e9420a00-d6fd-11e9-95ac-88909378124e.png However, after I read this #186 https://github.com/TuSimple/simpledet/issues/186 http://url and comment two lines as the same as Issue 186, I can start my training. But I do not know why I need to do that? I think I follow the setup scripts and I should install the newest version of the simpledet. Meanwhile, do we have side effect if I comment these two lines during the training? Thank you again! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#235 https://github.com/TuSimple/simpledet/issues/235?email_source=notifications&email_token=ABGODH4XWU2TLDFKMRQ4RJDQJVMAXA5CNFSM4IWYLI22YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HLMZ26A>, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGODH6F4QVWLZS3HGSGET3QJVMAXANCNFSM4IWYLI2Q .

Thank you for your reply. In the " tridentnet_r101v2c4_c5_2x.py" I set the " from_scratch = True ", so I do not load the trained checkpoint (i.e.: not fine-tune from the existed checkpoint). Do we have any other reason for this KeyError? Thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/TuSimple/simpledet/issues/235?email_source=notifications&email_token=ABGODH6LQHZRGLONHDHBIT3QJ4FITA5CNFSM4IWYLI22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6YCXMQ#issuecomment-531639218, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGODH7QBO233NICNI4OTLLQJ4FITANCNFSM4IWYLI2Q .

tairen99 commented 5 years ago

Thank you, Yuntao!

I see, the merge bn is currently designed for the fixbn setting only. It does not make sense for bn folding if you are training from scratch. We will update the train script later. On Mon, Sep 16, 2019 at 12:43 PM Tairen Chen notifications@github.com wrote: The checkpoint is trained with a early version of simpledet which does not employ any computation graph optimization. … <#m-4924169821271173032> On Sun, Sep 15, 2019 at 5:50 AM Tairen Chen @.***> wrote: Hi, Thank you for your good work. I am trying to deploy your trident framework for my own dataset. I follow your "setup script" to install the simpledet, and I modified the configuration file " tridentnet_r101v2c4_c5_2x.py". Basically, I change the number of classes, the number of gpu, and the dataset. Then, I start my training by running: $ python3 detection_train.py --config config/tridentnet_r101v2c4_c5_2x.py But I get following error: File "detection_train.py", line 139, in train_net sym, arg_params, aux_params = merge_bn(sym, arg_params, aux_params) File "/home/simpledet/utils/graph_optimize.py", line 77, in merge_bn gamma = mx.sym.var(node_name + "_gamma", shape=args[node_name + "_gamma"].shape) KeyError: 'bn0_gamma' [image: image] https://user-images.githubusercontent.com/32938376/64913924-e9420a00-d6fd-11e9-95ac-88909378124e.png However, after I read this #186 <#186> http://url and comment two lines as the same as Issue 186, I can start my training. But I do not know why I need to do that? I think I follow the setup scripts and I should install the newest version of the simpledet. Meanwhile, do we have side effect if I comment these two lines during the training? Thank you again! — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#235 <#235>?email_source=notifications&email_token=ABGODH4XWU2TLDFKMRQ4RJDQJVMAXA5CNFSM4IWYLI22YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HLMZ26A>, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGODH6F4QVWLZS3HGSGET3QJVMAXANCNFSM4IWYLI2Q . Thank you for your reply. In the " tridentnet_r101v2c4_c5_2x.py" I set the " from_scratch = True ", so I do not load the trained checkpoint (i.e.: not fine-tune from the existed checkpoint). Do we have any other reason for this KeyError? Thanks. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#235?email_source=notifications&email_token=ABGODH6LQHZRGLONHDHBIT3QJ4FITA5CNFSM4IWYLI22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6YCXMQ#issuecomment-531639218>, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGODH7QBO233NICNI4OTLLQJ4FITANCNFSM4IWYLI2Q .