weiliu89 / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
4.77k stars 1.67k forks source link

Focal Loss with ssd #720

Open abrams90 opened 7 years ago

abrams90 commented 7 years ago

Has anyone ever use focal loss with ssd framework to solve the hard example probem,can you please share how it works with me.Thanks!

zxDeepDiver commented 7 years ago

I have not seen any improvement by using focal loss on ssd.

weiliu89 commented 7 years ago

Yes, I tried. It indeed could make SSD type of detector on par with two stage method. You could implement focal loss by simply scaling the confidence (e.g. x4) before computing the sigmoid cross entropy loss.

abrams90 commented 7 years ago

@weiliu89 @zxDeepDiver Thanks a lot !

lcwyylcwyy commented 7 years ago

@weiliu89 I use the sigmoid cross entropy loss and focal lose. But after 1st iteration ( the mbox_loss = 1992.2 and mbox_loss = 192.0) , the mbox changed to nan. So how to train the sigmoid loss ? How to set base_lr? My base_lr is 0.002, and batch_size is 32 , 1 gpu. PS: I train the ssd on my own dataset. But it is very small.

weiliu89 commented 7 years ago

You could lower the learning rate initially and gradually increase it. (warm up)

weiliu89 commented 7 years ago

I have only tried the SigmoidCrossEntropyLoss. I think you have to scale z instead of p.

ryusaeba commented 7 years ago

@weiliu89 You mean you change conf_loss_type from Softmax to SigmoidCrossEntropyLoss. Then scale z, right?

weiliu89 commented 7 years ago

That should be the basic stuff. You should also change mining_type to NONE.

CrazySssst commented 7 years ago

@weiliu89 could you share your result about ssd with focal loss ?

ryusaeba commented 7 years ago

@weiliu89 , Thanks! I will try it later.

jinxuan777 commented 7 years ago

I0907 19:47:47.503582 40167 solver.cpp:243] Iteration 0, loss = 538.726 I0907 19:47:47.503639 40167 solver.cpp:259] Train net output #0: mbox_loss = 538.726 ( 1 = 538.726 loss) I0907 19:47:47.503693 40167 sgd_solver.cpp:138] Iteration 0, lr = 0.001 I0907 19:47:47.523170 40167 blocking_queue.cpp:50] Data layer prefetch queue empty I0907 19:47:59.226004 40167 solver.cpp:243] Iteration 10, loss = 488.884 I0907 19:47:59.226058 40167 solver.cpp:259] Train net output #0: mbox_loss = 418.708 ( 1 = 418.708 loss) I0907 19:47:59.226068 40167 sgd_solver.cpp:138] Iteration 10, lr = 0.001 I0907 19:48:11.308161 40167 solver.cpp:243] Iteration 20, loss = 412.334 I0907 19:48:11.308215 40167 solver.cpp:259] Train net output #0: mbox_loss = 393.423 ( 1 = 393.423 loss) I0907 19:48:11.308225 40167 sgd_solver.cpp:138] Iteration 20, lr = 0.001 I0907 19:48:24.216085 40167 solver.cpp:243] Iteration 30, loss = 426.297 I0907 19:48:24.216294 40167 solver.cpp:259] Train net output #0: mbox_loss = 269.242 ( 1 = 269.242 loss) I0907 19:48:24.216308 40167 sgd_solver.cpp:138] Iteration 30, lr = 0.001 I0907 19:48:36.642977 40167 solver.cpp:243] Iteration 40, loss = 449.73 I0907 19:48:36.643034 40167 solver.cpp:259] Train net output #0: mbox_loss = 424.498 ( 1 = 424.498 loss) I0907 19:48:36.643045 40167 sgd_solver.cpp:138] Iteration 40, lr = 0.001 I0907 19:48:49.470823 40167 solver.cpp:243] Iteration 50, loss = 520.721 I0907 19:48:49.470880 40167 solver.cpp:259] Train net output #0: mbox_loss = 450.236 ( 1 = 450.236 loss) I0907 19:48:49.470890 40167 sgd_solver.cpp:138] Iteration 50, lr = 0.001 I0907 19:49:01.526100 40167 solver.cpp:243] Iteration 60, loss = 470.837 I0907 19:49:01.526652 40167 solver.cpp:259] Train net output #0: mbox_loss = 504.9 ( 1 = 504.9 loss) I0907 19:49:01.526669 40167 sgd_solver.cpp:138] Iteration 60, lr = 0.001 I0907 19:49:15.080325 40167 solver.cpp:243] Iteration 70, loss = 441.191 I0907 19:49:15.080377 40167 solver.cpp:259] Train net output #0: mbox_loss = 343.061 ( 1 = 343.061 loss) I0907 19:49:15.080387 40167 sgd_solver.cpp:138] Iteration 70, lr = 0.001 I0907 19:49:27.861601 40167 solver.cpp:243] Iteration 80, loss = 416.44 I0907 19:49:27.861662 40167 solver.cpp:259] Train net output #0: mbox_loss = 524.938 ( 1 = 524.938 loss) I0907 19:49:27.861677 40167 sgd_solver.cpp:138] Iteration 80, lr = 0.001 I0907 19:49:40.567715 40167 solver.cpp:243] Iteration 90, loss = 419.763 I0907 19:49:40.568455 40167 solver.cpp:259] Train net output #0: mbox_loss = 485.486 ( 1 = 485.486 loss) I0907 19:49:40.568467 40167 sgd_solver.cpp:138] Iteration 90, lr = 0.001 I0907 19:49:52.489009 40167 solver.cpp:243] Iteration 100, loss = 496.385 I0907 19:49:52.489078 40167 solver.cpp:259] Train net output #0: mbox_loss = 598.885 ( 1 = 598.885 loss) I0907 19:49:52.489092 40167 sgd_solver.cpp:138] Iteration 100, lr = 0.001 I0907 19:50:04.454450 40167 solver.cpp:243] Iteration 110, loss = 440.035 I0907 19:50:04.454507 40167 solver.cpp:259] Train net output #0: mbox_loss = 552.493 ( 1 = 552.493 loss) I0907 19:50:04.454519 40167 sgd_solver.cpp:138] Iteration 110, lr = 0.001 focal loss,is it normal? @weiliu89

xuanyuyt commented 7 years ago

@jinxuan777 why loss is so big...,you can change alpha=0.5 and gamma=0, compare with softmax loss

jinxuan777 commented 7 years ago

thanks,I will try

peyer commented 6 years ago

@weiliu89 I have attached focal_loss_layer.cpp .cu .hpp to SSD for training face detector, and changed caffe.proto, multibox_loss.cpp, bbox_util.cpp, bbox_util.cu, but when I used detect_output_layer.cpp, the CHECK_EQ(numpriors * numclasses, bottom[1]->channels()) shown error, since the bottom[1]->channels() == numpriors, how should I change it or how can I attach focal_loss to SSD for binary detector

OxInsky commented 6 years ago

can you solve this problem? @peyer ? I also met this !

BOBrown commented 6 years ago

It seems the speed of convergence becomes more slower without OHEM. Does it need more iterations to optimize the model?

BOBrown commented 6 years ago

@weiliu89 it will lead to degenerate the performance when we employ focal loss with OHEM?