Hi,
I ran the SSD code in the baseline to train the ctw datasets with the batch of 12 (instead of 14 because of the limited GPU memory), but the loss is NAN. I just followd the "CTW dataset tutorial (Part 3: detection baseline)", and I don't change any things except the batch-size. Can you give me some advice?
I0403 09:59:07.896572 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss)
I0403 09:59:08.678768 38087 sgd_solver.cpp:138] Iteration 860, lr = 0.001
I0403 09:59:25.406322 38087 solver.cpp:243] Iteration 870, loss = nan
I0403 09:59:25.406674 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss)
I0403 09:59:25.406772 38087 sgd_solver.cpp:138] Iteration 870, lr = 0.001
I0403 09:59:40.899689 38087 solver.cpp:243] Iteration 880, loss = nan
I0403 09:59:40.899760 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss)
I0403 09:59:41.602229 38087 sgd_solver.cpp:138] Iteration 880, lr = 0.001
I0403 09:59:57.435994 38087 solver.cpp:243] Iteration 890, loss = nan
I0403 09:59:57.436153 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss)
I0403 09:59:57.436187 38087 sgd_solver.cpp:138] Iteration 890, lr = 0.001
I0403 10:00:14.717105 38087 solver.cpp:243] Iteration 900, loss = nan
I0403 10:00:14.717172 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss)
I0403 10:00:14.717288 38087 sgd_solver.cpp:138] Iteration 900, lr = 0.001
I0403 10:00:31.561822 38087 solver.cpp:243] Iteration 910, loss = nan
I0403 10:00:31.562093 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss)
I0403 10:00:32.274315 38087 sgd_solver.cpp:138] Iteration 910, lr = 0.001
I0403 10:00:48.392671 38087 solver.cpp:243] Iteration 920, loss = nan
I0403 10:00:48.392729 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss)
I0403 10:00:48.392833 38087 sgd_solver.cpp:138] Iteration 920, lr = 0.001
I0403 10:01:04.803617 38087 solver.cpp:243] Iteration 930, loss = nan
I0403 10:01:04.804121 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss)
I0403 10:01:05.511602 38087 sgd_solver.cpp:138] Iteration 930, lr = 0.001
I0403 10:01:21.101698 38087 solver.cpp:243] Iteration 940, loss = nan
I0403 10:01:21.101753 38087 solver.cpp:259] Train net output #0: mbox_loss = nan (* 1 = nan loss)
I0403 10:01:21.807464 38087 sgd_solver.cpp:138] Iteration 940, lr = 0.001
Hi, I ran the SSD code in the baseline to train the ctw datasets with the batch of 12 (instead of 14 because of the limited GPU memory), but the loss is NAN. I just followd the "CTW dataset tutorial (Part 3: detection baseline)", and I don't change any things except the batch-size. Can you give me some advice?
I0403 09:59:07.896572 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss) I0403 09:59:08.678768 38087 sgd_solver.cpp:138] Iteration 860, lr = 0.001 I0403 09:59:25.406322 38087 solver.cpp:243] Iteration 870, loss = nan I0403 09:59:25.406674 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss) I0403 09:59:25.406772 38087 sgd_solver.cpp:138] Iteration 870, lr = 0.001 I0403 09:59:40.899689 38087 solver.cpp:243] Iteration 880, loss = nan I0403 09:59:40.899760 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss) I0403 09:59:41.602229 38087 sgd_solver.cpp:138] Iteration 880, lr = 0.001 I0403 09:59:57.435994 38087 solver.cpp:243] Iteration 890, loss = nan I0403 09:59:57.436153 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss) I0403 09:59:57.436187 38087 sgd_solver.cpp:138] Iteration 890, lr = 0.001 I0403 10:00:14.717105 38087 solver.cpp:243] Iteration 900, loss = nan I0403 10:00:14.717172 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss) I0403 10:00:14.717288 38087 sgd_solver.cpp:138] Iteration 900, lr = 0.001 I0403 10:00:31.561822 38087 solver.cpp:243] Iteration 910, loss = nan I0403 10:00:31.562093 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss) I0403 10:00:32.274315 38087 sgd_solver.cpp:138] Iteration 910, lr = 0.001 I0403 10:00:48.392671 38087 solver.cpp:243] Iteration 920, loss = nan I0403 10:00:48.392729 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss) I0403 10:00:48.392833 38087 sgd_solver.cpp:138] Iteration 920, lr = 0.001 I0403 10:01:04.803617 38087 solver.cpp:243] Iteration 930, loss = nan I0403 10:01:04.804121 38087 solver.cpp:259] Train net output #0: mbox_loss = nan ( 1 = nan loss) I0403 10:01:05.511602 38087 sgd_solver.cpp:138] Iteration 930, lr = 0.001 I0403 10:01:21.101698 38087 solver.cpp:243] Iteration 940, loss = nan I0403 10:01:21.101753 38087 solver.cpp:259] Train net output #0: mbox_loss = nan (* 1 = nan loss) I0403 10:01:21.807464 38087 sgd_solver.cpp:138] Iteration 940, lr = 0.001