yuantailing / ctw-baseline

Baseline methods for [CTW dataset](https://ctwdataset.github.io/)
MIT License
330 stars 88 forks source link

The NAN loss value in SSD #10

Closed dodgaga closed 6 years ago

dodgaga commented 6 years ago

Hi, I ran the SSD code in the baseline to train the ctw datasets with the batch of 12 (instead of 14 because of the limited GPU memory), but the loss is NAN. I just followd the "CTW dataset tutorial (Part 3: detection baseline)", and I don't change any things except the batch-size. Can you give me some advice?

I0403 09:59:07.896572 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss) I0403 09:59:08.678768 38087 sgd_solver.cpp:138] Iteration 860, lr = 0.001 I0403 09:59:25.406322 38087 solver.cpp:243] Iteration 870, loss = nan I0403 09:59:25.406674 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss) I0403 09:59:25.406772 38087 sgd_solver.cpp:138] Iteration 870, lr = 0.001 I0403 09:59:40.899689 38087 solver.cpp:243] Iteration 880, loss = nan I0403 09:59:40.899760 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss) I0403 09:59:41.602229 38087 sgd_solver.cpp:138] Iteration 880, lr = 0.001 I0403 09:59:57.435994 38087 solver.cpp:243] Iteration 890, loss = nan I0403 09:59:57.436153 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss) I0403 09:59:57.436187 38087 sgd_solver.cpp:138] Iteration 890, lr = 0.001 I0403 10:00:14.717105 38087 solver.cpp:243] Iteration 900, loss = nan I0403 10:00:14.717172 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss) I0403 10:00:14.717288 38087 sgd_solver.cpp:138] Iteration 900, lr = 0.001 I0403 10:00:31.561822 38087 solver.cpp:243] Iteration 910, loss = nan I0403 10:00:31.562093 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss) I0403 10:00:32.274315 38087 sgd_solver.cpp:138] Iteration 910, lr = 0.001 I0403 10:00:48.392671 38087 solver.cpp:243] Iteration 920, loss = nan I0403 10:00:48.392729 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss) I0403 10:00:48.392833 38087 sgd_solver.cpp:138] Iteration 920, lr = 0.001 I0403 10:01:04.803617 38087 solver.cpp:243] Iteration 930, loss = nan I0403 10:01:04.804121 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss) I0403 10:01:05.511602 38087 sgd_solver.cpp:138] Iteration 930, lr = 0.001 I0403 10:01:21.101698 38087 solver.cpp:243] Iteration 940, loss = nan I0403 10:01:21.101753 38087 solver.cpp:259] Train net output #0: mbox_loss = nan (* 1 = nan loss) I0403 10:01:21.807464 38087 sgd_solver.cpp:138] Iteration 940, lr = 0.001

dodgaga commented 6 years ago

I lower the initial learning rate and solve the problems as SSD issue 543