Closed zhujingsong closed 6 years ago
Hi, does this happen when trying to train from DAVIS dataset?
Yes, I download the DAVIS 2016 dataset, augment it just like 'trainpairs.txt'. Then add 'DATA_DIR_ROOT', and start training, following the steps of 'readme'. I am trying to check what goes wrong......
the training process is really tricky, i follow the steps, now the problem is that the loss is always fluctuating from 20000~30000, like this. Do you have any ideas why is that?
What does the loss look like when you are training?
I0322 22:55:30.170424 19131 solver.cpp:219] Iteration 4800 (0.390295 iter/s, 51.2433s/20 iters), loss = 31645.1 I0322 22:55:30.170526 19131 solver.cpp:238] Train net output #0: dsn2_loss = 6840.05 ( 1 = 6840.05 loss) I0322 22:55:30.170538 19131 solver.cpp:238] Train net output #1: dsn3_loss = 3939.92 ( 1 = 3939.92 loss) I0322 22:55:30.170545 19131 solver.cpp:238] Train net output #2: dsn4_loss = 2079.14 ( 1 = 2079.14 loss) I0322 22:55:30.170553 19131 solver.cpp:238] Train net output #3: dsn5_loss = 1395.85 ( 1 = 1395.85 loss) I0322 22:55:30.170562 19131 solver.cpp:238] Train net output #4: fuse_loss = 1333.71 ( 1 = 1333.71 loss) I0322 22:55:30.170588 19131 sgd_solver.cpp:105] Iteration 4800, lr = 1e-08 I0322 22:56:15.607455 19131 solver.cpp:219] Iteration 4820 (0.440192 iter/s, 45.4347s/20 iters), loss = 28704.2 I0322 22:56:15.607555 19131 solver.cpp:238] Train net output #0: dsn2_loss = 4781.82 ( 1 = 4781.82 loss) I0322 22:56:15.607583 19131 solver.cpp:238] Train net output #1: dsn3_loss = 2303.21 ( 1 = 2303.21 loss) I0322 22:56:15.607591 19131 solver.cpp:238] Train net output #2: dsn4_loss = 1503.31 ( 1 = 1503.31 loss) I0322 22:56:15.607599 19131 solver.cpp:238] Train net output #3: dsn5_loss = 1911.06 ( 1 = 1911.06 loss) I0322 22:56:15.607606 19131 solver.cpp:238] Train net output #4: fuse_loss = 1549.09 ( 1 = 1549.09 loss) I0322 22:56:15.607615 19131 sgd_solver.cpp:105] Iteration 4820, lr = 1e-08 I0322 22:57:04.708586 19131 solver.cpp:219] Iteration 4840 (0.407343 iter/s, 49.0987s/20 iters), loss = 17773.7 I0322 22:57:04.708680 19131 solver.cpp:238] Train net output #0: dsn2_loss = 8524.32 ( 1 = 8524.32 loss) I0322 22:57:04.708704 19131 solver.cpp:238] Train net output #1: dsn3_loss = 4778.96 ( 1 = 4778.96 loss) I0322 22:57:04.708711 19131 solver.cpp:238] Train net output #2: dsn4_loss = 3759.5 ( 1 = 3759.5 loss) I0322 22:57:04.708719 19131 solver.cpp:238] Train net output #3: dsn5_loss = 2714.62 ( 1 = 2714.62 loss) I0322 22:57:04.708726 19131 solver.cpp:238] Train net output #4: fuse_loss = 2164.28 ( 1 = 2164.28 loss) I0322 22:57:04.708735 19131 sgd_solver.cpp:105] Iteration 4840, lr = 1e-08 I0322 22:57:55.019158 19131 solver.cpp:219] Iteration 4860 (0.39755 iter/s, 50.3081s/20 iters), loss = 17791.3 I0322 22:57:55.019253 19131 solver.cpp:238] Train net output #0: dsn2_loss = 6271.85 ( 1 = 6271.85 loss) I0322 22:57:55.019265 19131 solver.cpp:238] Train net output #1: dsn3_loss = 4078.07 ( 1 = 4078.07 loss) I0322 22:57:55.019287 19131 solver.cpp:238] Train net output #2: dsn4_loss = 1559.87 ( 1 = 1559.87 loss) I0322 22:57:55.019295 19131 solver.cpp:238] Train net output #3: dsn5_loss = 1797.01 ( 1 = 1797.01 loss) I0322 22:57:55.019304 19131 solver.cpp:238] Train net output #4: fuse_loss = 1200.79 ( 1 = 1200.79 loss) I0322 22:57:55.019312 19131 sgd_solver.cpp:105] Iteration 4860, lr = 1e-08 I0322 22:58:42.910308 19131 solver.cpp:219] Iteration 4880 (0.417634 iter/s, 47.8888s/20 iters), loss = 26082.1 I0322 22:58:42.910398 19131 solver.cpp:238] Train net output #0: dsn2_loss = 310.295 ( 1 = 310.295 loss) I0322 22:58:42.910411 19131 solver.cpp:238] Train net output #1: dsn3_loss = 161.835 ( 1 = 161.835 loss) I0322 22:58:42.910434 19131 solver.cpp:238] Train net output #2: dsn4_loss = 101.419 ( 1 = 101.419 loss) I0322 22:58:42.910454 19131 solver.cpp:238] Train net output #3: dsn5_loss = 145.774 ( 1 = 145.774 loss) I0322 22:58:42.910462 19131 solver.cpp:238] Train net output #4: fuse_loss = 162.083 ( 1 = 162.083 loss) I0322 22:58:42.910472 19131 sgd_solver.cpp:105] Iteration 4880, lr = 1e-08 I0322 22:59:31.013831 19131 solver.cpp:219] Iteration 4900 (0.41579 iter/s, 48.1013s/20 iters), loss = 18593.2 I0322 22:59:31.013931 19131 solver.cpp:238] Train net output #0: dsn2_loss = 12270.6 ( 1 = 12270.6 loss) I0322 22:59:31.013960 19131 solver.cpp:238] Train net output #1: dsn3_loss = 5063.43 ( 1 = 5063.43 loss) I0322 22:59:31.013989 19131 solver.cpp:238] Train net output #2: dsn4_loss = 1855.32 ( 1 = 1855.32 loss) I0322 22:59:31.014012 19131 solver.cpp:238] Train net output #3: dsn5_loss = 1710.37 ( 1 = 1710.37 loss) I0322 22:59:31.014035 19131 solver.cpp:238] Train net output #4: fuse_loss = 1557.83 ( 1 = 1557.83 loss) I0322 22:59:31.014050 19131 sgd_solver.cpp:105] Iteration 4900, lr = 1e-08 I0322 23:00:19.233353 19131 solver.cpp:219] Iteration 4920 (0.414789 iter/s, 48.2173s/20 iters), loss = 19357.4 I0322 23:00:19.233439 19131 solver.cpp:238] Train net output #0: dsn2_loss = 3033.52 ( 1 = 3033.52 loss) I0322 23:00:19.233469 19131 solver.cpp:238] Train net output #1: dsn3_loss = 2056.26 ( 1 = 2056.26 loss) I0322 23:00:19.233476 19131 solver.cpp:238] Train net output #2: dsn4_loss = 1136.81 ( 1 = 1136.81 loss) I0322 23:00:19.233484 19131 solver.cpp:238] Train net output #3: dsn5_loss = 1568.95 ( 1 = 1568.95 loss) I0322 23:00:19.233491 19131 solver.cpp:238] Train net output #4: fuse_loss = 1041.03 (* 1 = 1041.03 loss) I0322 23:00:19.233499 19131 sgd_solver.cpp:105] Iteration 4920, lr = 1e-08
The reason for the fluctuating loss is that we do not normalize the loss by the size of the augmented image. So if there are many small scale images in a minibatch, the loss is going to be smaller. In case you adjust this, keep in mind that the learning rate needs to be adjusted accordingly.
Having said that, I cannot reproduce your nan error. Was there any bug that now is fixed?
Hi, you should focus on the Train net output #4: fuse_loss. This loss is the final mask map of the input frame, you can find the framework from the ~/OSVOS-caffe-master/src/parent/solvers/train_val_step1.prototxt. And looking at your training process, the fuse_loss is declining generally. And it is always fluctuating from hundreds to may be one or two thousand.
At 2018-03-22 23:02:49, "zhujingsong" notifications@github.com wrote:
the training process is really tricky, i follow the steps, now the problem is that the loss is always fluctuating from 20000~30000, like this. Do you have any ideas why is that?
What does the loss look like when you are training?
I0322 22:55:30.170424 19131 solver.cpp:219] Iteration 4800 (0.390295 iter/s, 51.2433s/20 iters), loss = 31645.1 I0322 22:55:30.170526 19131 solver.cpp:238] Train net output #0: dsn2_loss = 6840.05 ( 1 = 6840.05 loss) I0322 22:55:30.170538 19131 solver.cpp:238] Train net output #1: dsn3_loss = 3939.92 ( 1 = 3939.92 loss) I0322 22:55:30.170545 19131 solver.cpp:238] Train net output #2: dsn4_loss = 2079.14 ( 1 = 2079.14 loss) I0322 22:55:30.170553 19131 solver.cpp:238] Train net output #3: dsn5_loss = 1395.85 ( 1 = 1395.85 loss) I0322 22:55:30.170562 19131 solver.cpp:238] Train net output #4: fuse_loss = 1333.71 ( 1 = 1333.71 loss) I0322 22:55:30.170588 19131 sgd_solver.cpp:105] Iteration 4800, lr = 1e-08 I0322 22:56:15.607455 19131 solver.cpp:219] Iteration 4820 (0.440192 iter/s, 45.4347s/20 iters), loss = 28704.2 I0322 22:56:15.607555 19131 solver.cpp:238] Train net output #0: dsn2_loss = 4781.82 ( 1 = 4781.82 loss) I0322 22:56:15.607583 19131 solver.cpp:238] Train net output #1: dsn3_loss = 2303.21 ( 1 = 2303.21 loss) I0322 22:56:15.607591 19131 solver.cpp:238] Train net output #2: dsn4_loss = 1503.31 ( 1 = 1503.31 loss) I0322 22:56:15.607599 19131 solver.cpp:238] Train net output #3: dsn5_loss = 1911.06 ( 1 = 1911.06 loss) I0322 22:56:15.607606 19131 solver.cpp:238] Train net output #4: fuse_loss = 1549.09 ( 1 = 1549.09 loss) I0322 22:56:15.607615 19131 sgd_solver.cpp:105] Iteration 4820, lr = 1e-08 I0322 22:57:04.708586 19131 solver.cpp:219] Iteration 4840 (0.407343 iter/s, 49.0987s/20 iters), loss = 17773.7 I0322 22:57:04.708680 19131 solver.cpp:238] Train net output #0: dsn2_loss = 8524.32 ( 1 = 8524.32 loss) I0322 22:57:04.708704 19131 solver.cpp:238] Train net output #1: dsn3_loss = 4778.96 ( 1 = 4778.96 loss) I0322 22:57:04.708711 19131 solver.cpp:238] Train net output #2: dsn4_loss = 3759.5 ( 1 = 3759.5 loss) I0322 22:57:04.708719 19131 solver.cpp:238] Train net output #3: dsn5_loss = 2714.62 ( 1 = 2714.62 loss) I0322 22:57:04.708726 19131 solver.cpp:238] Train net output #4: fuse_loss = 2164.28 ( 1 = 2164.28 loss) I0322 22:57:04.708735 19131 sgd_solver.cpp:105] Iteration 4840, lr = 1e-08 I0322 22:57:55.019158 19131 solver.cpp:219] Iteration 4860 (0.39755 iter/s, 50.3081s/20 iters), loss = 17791.3 I0322 22:57:55.019253 19131 solver.cpp:238] Train net output #0: dsn2_loss = 6271.85 ( 1 = 6271.85 loss) I0322 22:57:55.019265 19131 solver.cpp:238] Train net output #1: dsn3_loss = 4078.07 ( 1 = 4078.07 loss) I0322 22:57:55.019287 19131 solver.cpp:238] Train net output #2: dsn4_loss = 1559.87 ( 1 = 1559.87 loss) I0322 22:57:55.019295 19131 solver.cpp:238] Train net output #3: dsn5_loss = 1797.01 ( 1 = 1797.01 loss) I0322 22:57:55.019304 19131 solver.cpp:238] Train net output #4: fuse_loss = 1200.79 ( 1 = 1200.79 loss) I0322 22:57:55.019312 19131 sgd_solver.cpp:105] Iteration 4860, lr = 1e-08 I0322 22:58:42.910308 19131 solver.cpp:219] Iteration 4880 (0.417634 iter/s, 47.8888s/20 iters), loss = 26082.1 I0322 22:58:42.910398 19131 solver.cpp:238] Train net output #0: dsn2_loss = 310.295 ( 1 = 310.295 loss) I0322 22:58:42.910411 19131 solver.cpp:238] Train net output #1: dsn3_loss = 161.835 ( 1 = 161.835 loss) I0322 22:58:42.910434 19131 solver.cpp:238] Train net output #2: dsn4_loss = 101.419 ( 1 = 101.419 loss) I0322 22:58:42.910454 19131 solver.cpp:238] Train net output #3: dsn5_loss = 145.774 ( 1 = 145.774 loss) I0322 22:58:42.910462 19131 solver.cpp:238] Train net output #4: fuse_loss = 162.083 ( 1 = 162.083 loss) I0322 22:58:42.910472 19131 sgd_solver.cpp:105] Iteration 4880, lr = 1e-08 I0322 22:59:31.013831 19131 solver.cpp:219] Iteration 4900 (0.41579 iter/s, 48.1013s/20 iters), loss = 18593.2 I0322 22:59:31.013931 19131 solver.cpp:238] Train net output #0: dsn2_loss = 12270.6 ( 1 = 12270.6 loss) I0322 22:59:31.013960 19131 solver.cpp:238] Train net output #1: dsn3_loss = 5063.43 ( 1 = 5063.43 loss) I0322 22:59:31.013989 19131 solver.cpp:238] Train net output #2: dsn4_loss = 1855.32 ( 1 = 1855.32 loss) I0322 22:59:31.014012 19131 solver.cpp:238] Train net output #3: dsn5_loss = 1710.37 ( 1 = 1710.37 loss) I0322 22:59:31.014035 19131 solver.cpp:238] Train net output #4: fuse_loss = 1557.83 ( 1 = 1557.83 loss) I0322 22:59:31.014050 19131 sgd_solver.cpp:105] Iteration 4900, lr = 1e-08 I0322 23:00:19.233353 19131 solver.cpp:219] Iteration 4920 (0.414789 iter/s, 48.2173s/20 iters), loss = 19357.4 I0322 23:00:19.233439 19131 solver.cpp:238] Train net output #0: dsn2_loss = 3033.52 ( 1 = 3033.52 loss) I0322 23:00:19.233469 19131 solver.cpp:238] Train net output #1: dsn3_loss = 2056.26 ( 1 = 2056.26 loss) I0322 23:00:19.233476 19131 solver.cpp:238] Train net output #2: dsn4_loss = 1136.81 ( 1 = 1136.81 loss) I0322 23:00:19.233484 19131 solver.cpp:238] Train net output #3: dsn5_loss = 1568.95 ( 1 = 1568.95 loss) I0322 23:00:19.233491 19131 solver.cpp:238] Train net output #4: fuse_loss = 1041.03 (* 1 = 1041.03 loss) I0322 23:00:19.233499 19131 sgd_solver.cpp:105] Iteration 4920, lr = 1e-08
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
I use the Titan XP to train the parent model. I tried base_lr = 1e-8(default) / 1e-9, but the loss always fluctuates ups and downs, and it became 'nan' at some iteration. Could you tell me how to tackle the problem? I would really appreciate it if you could give a reply.
I0321 21:52:42.315754 11354 solver.cpp:331] Iteration 0, Testing net (#0) I0321 21:52:44.937382 11354 solver.cpp:219] Iteration 0 (0 iter/s, 2.62483s/20 iters), loss = 116012 I0321 21:52:44.937474 11354 solver.cpp:238] Train net output #0: dsn2_loss = 45369.6 ( 1 = 45369.6 loss) I0321 21:52:44.937501 11354 solver.cpp:238] Train net output #1: dsn3_loss = 45369.6 ( 1 = 45369.6 loss) I0321 21:52:44.937510 11354 solver.cpp:238] Train net output #2: dsn4_loss = 45369.6 ( 1 = 45369.6 loss) I0321 21:52:44.937517 11354 solver.cpp:238] Train net output #3: dsn5_loss = 45369.6 ( 1 = 45369.6 loss) I0321 21:52:44.937525 11354 solver.cpp:238] Train net output #4: fuse_loss = 45369.6 ( 1 = 45369.6 loss) I0321 21:52:44.937536 11354 sgd_solver.cpp:105] Iteration 0, lr = 1e-09 I0321 21:53:38.136545 11354 solver.cpp:219] Iteration 20 (0.375968 iter/s, 53.196s/20 iters), loss = 132717 I0321 21:53:38.136634 11354 solver.cpp:238] Train net output #0: dsn2_loss = 30310.2 ( 1 = 30310.2 loss) I0321 21:53:38.136662 11354 solver.cpp:238] Train net output #1: dsn3_loss = 29715.9 ( 1 = 29715.9 loss) I0321 21:53:38.136678 11354 solver.cpp:238] Train net output #2: dsn4_loss = 30441.8 ( 1 = 30441.8 loss) I0321 21:53:38.136693 11354 solver.cpp:238] Train net output #3: dsn5_loss = 30483.2 ( 1 = 30483.2 loss) I0321 21:53:38.136708 11354 solver.cpp:238] Train net output #4: fuse_loss = 30387.5 ( 1 = 30387.5 loss) I0321 21:53:38.136724 11354 sgd_solver.cpp:105] Iteration 20, lr = 1e-09 I0321 21:54:30.748098 11354 solver.cpp:219] Iteration 40 (0.380165 iter/s, 52.6087s/20 iters), loss = 107836 I0321 21:54:30.748172 11354 solver.cpp:238] Train net output #0: dsn2_loss = 19628.8 ( 1 = 19628.8 loss) I0321 21:54:30.748181 11354 solver.cpp:238] Train net output #1: dsn3_loss = 19075.7 ( 1 = 19075.7 loss) I0321 21:54:30.748188 11354 solver.cpp:238] Train net output #2: dsn4_loss = 19764.4 ( 1 = 19764.4 loss) I0321 21:54:30.748196 11354 solver.cpp:238] Train net output #3: dsn5_loss = 19837 ( 1 = 19837 loss) I0321 21:54:30.748219 11354 solver.cpp:238] Train net output #4: fuse_loss = 19558.3 ( 1 = 19558.3 loss) I0321 21:54:30.748229 11354 sgd_solver.cpp:105] Iteration 40, lr = 1e-09 I0321 21:55:20.153046 11354 solver.cpp:219] Iteration 60 (0.404839 iter/s, 49.4024s/20 iters), loss = 99894 I0321 21:55:20.153127 11354 solver.cpp:238] Train net output #0: dsn2_loss = 10907.5 ( 1 = 10907.5 loss) I0321 21:55:20.153141 11354 solver.cpp:238] Train net output #1: dsn3_loss = 9222.18 ( 1 = 9222.18 loss) I0321 21:55:20.153151 11354 solver.cpp:238] Train net output #2: dsn4_loss = 11842.5 ( 1 = 11842.5 loss) I0321 21:55:20.153169 11354 solver.cpp:238] Train net output #3: dsn5_loss = 11927.8 ( 1 = 11927.8 loss) I0321 21:55:20.153182 11354 solver.cpp:238] Train net output #4: fuse_loss = 10795.1 ( 1 = 10795.1 loss) I0321 21:55:20.153194 11354 sgd_solver.cpp:105] Iteration 60, lr = 1e-09 I0321 21:56:07.384881 11354 solver.cpp:219] Iteration 80 (0.423465 iter/s, 47.2294s/20 iters), loss = 44943.5 I0321 21:56:07.384968 11354 solver.cpp:238] Train net output #0: dsn2_loss = 17211.5 ( 1 = 17211.5 loss) I0321 21:56:07.384996 11354 solver.cpp:238] Train net output #1: dsn3_loss = 11019.4 ( 1 = 11019.4 loss) I0321 21:56:07.385004 11354 solver.cpp:238] Train net output #2: dsn4_loss = 18592.5 ( 1 = 18592.5 loss) I0321 21:56:07.385011 11354 solver.cpp:238] Train net output #3: dsn5_loss = 18892.4 ( 1 = 18892.4 loss) I0321 21:56:07.385018 11354 solver.cpp:238] Train net output #4: fuse_loss = 14495.2 ( 1 = 14495.2 loss) I0321 21:56:07.385027 11354 sgd_solver.cpp:105] Iteration 80, lr = 1e-09 I0321 21:56:54.658689 11354 solver.cpp:219] Iteration 100 (0.423089 iter/s, 47.2714s/20 iters), loss = 67899.7 I0321 21:56:54.658788 11354 solver.cpp:238] Train net output #0: dsn2_loss = 14982.7 ( 1 = 14982.7 loss) I0321 21:56:54.658818 11354 solver.cpp:238] Train net output #1: dsn3_loss = 10885.6 ( 1 = 10885.6 loss) I0321 21:56:54.658833 11354 solver.cpp:238] Train net output #2: dsn4_loss = 15006.9 ( 1 = 15006.9 loss) I0321 21:56:54.658844 11354 solver.cpp:238] Train net output #3: dsn5_loss = 15270.3 ( 1 = 15270.3 loss) I0321 21:56:54.658861 11354 solver.cpp:238] Train net output #4: fuse_loss = 12647.6 ( 1 = 12647.6 loss) I0321 21:56:54.658876 11354 sgd_solver.cpp:105] Iteration 100, lr = 1e-09 I0321 21:57:43.779109 11354 solver.cpp:219] Iteration 120 (0.407183 iter/s, 49.118s/20 iters), loss = 63917.8 I0321 21:57:43.779191 11354 solver.cpp:238] Train net output #0: dsn2_loss = 9679.94 ( 1 = 9679.94 loss) I0321 21:57:43.779199 11354 solver.cpp:238] Train net output #1: dsn3_loss = 4138.92 ( 1 = 4138.92 loss) I0321 21:57:43.779206 11354 solver.cpp:238] Train net output #2: dsn4_loss = 9165.52 ( 1 = 9165.52 loss) I0321 21:57:43.779215 11354 solver.cpp:238] Train net output #3: dsn5_loss = 11711.8 ( 1 = 11711.8 loss) I0321 21:57:43.779238 11354 solver.cpp:238] Train net output #4: fuse_loss = 5715.27 ( 1 = 5715.27 loss) I0321 21:57:43.779247 11354 sgd_solver.cpp:105] Iteration 120, lr = 1e-09 I0321 21:58:35.024227 11354 solver.cpp:219] Iteration 140 (0.3903 iter/s, 51.2426s/20 iters), loss = 87141.5 I0321 21:58:35.024309 11354 solver.cpp:238] Train net output #0: dsn2_loss = 4843.74 ( 1 = 4843.74 loss) I0321 21:58:35.024336 11354 solver.cpp:238] Train net output #1: dsn3_loss = 2672.27 ( 1 = 2672.27 loss) I0321 21:58:35.024345 11354 solver.cpp:238] Train net output #2: dsn4_loss = 4293.57 ( 1 = 4293.57 loss) I0321 21:58:35.024353 11354 solver.cpp:238] Train net output #3: dsn5_loss = 5528.3 ( 1 = 5528.3 loss) I0321 21:58:35.024363 11354 solver.cpp:238] Train net output #4: fuse_loss = 3142 ( 1 = 3142 loss) I0321 21:58:35.024382 11354 sgd_solver.cpp:105] Iteration 140, lr = 1e-09 I0321 21:59:14.739044 11354 solver.cpp:219] Iteration 160 (0.503615 iter/s, 39.7129s/20 iters), loss = nan I0321 21:59:14.739117 11354 solver.cpp:238] Train net output #0: dsn2_loss = nan ( 1 = nan loss) I0321 21:59:14.739128 11354 solver.cpp:238] Train net output #1: dsn3_loss = nan ( 1 = nan loss) I0321 21:59:14.739141 11354 solver.cpp:238] Train net output #2: dsn4_loss = nan ( 1 = nan loss) I0321 21:59:14.739157 11354 solver.cpp:238] Train net output #3: dsn5_loss = nan ( 1 = nan loss) I0321 21:59:14.739171 11354 solver.cpp:238] Train net output #4: fuse_loss = nan ( 1 = nan loss) I0321 21:59:14.739183 11354 sgd_solver.cpp:105] Iteration 160, lr = 1e-09 I0321 21:59:55.145210 11354 solver.cpp:219] Iteration 180 (0.494997 iter/s, 40.4043s/20 iters), loss = nan I0321 21:59:55.145300 11354 solver.cpp:238] Train net output #0: dsn2_loss = nan ( 1 = nan loss) I0321 21:59:55.145318 11354 solver.cpp:238] Train net output #1: dsn3_loss = nan ( 1 = nan loss) I0321 21:59:55.145329 11354 solver.cpp:238] Train net output #2: dsn4_loss = nan ( 1 = nan loss) I0321 21:59:55.145340 11354 solver.cpp:238] Train net output #3: dsn5_loss = nan ( 1 = nan loss) I0321 21:59:55.145361 11354 solver.cpp:238] Train net output #4: fuse_loss = nan ( 1 = nan loss) I0321 21:59:55.145376 11354 sgd_solver.cpp:105] Iteration 180, lr = 1e-09 I0321 22:00:36.563983 11354 solver.cpp:219] Iteration 200 (0.482895 iter/s, 41.4168s/20 iters), loss = nan I0321 22:00:36.564069 11354 solver.cpp:238] Train net output #0: dsn2_loss = nan ( 1 = nan loss) I0321 22:00:36.564098 11354 solver.cpp:238] Train net output #1: dsn3_loss = nan ( 1 = nan loss) I0321 22:00:36.564111 11354 solver.cpp:238] Train net output #2: dsn4_loss = nan ( 1 = nan loss) I0321 22:00:36.564124 11354 solver.cpp:238] Train net output #3: dsn5_loss = nan ( 1 = nan loss) I0321 22:00:36.564138 11354 solver.cpp:238] Train net output #4: fuse_loss = nan ( 1 = nan loss) I0321 22:00:36.564154 11354 sgd_solver.cpp:105] Iteration 200, lr = 1e-09 I0321 22:01:18.799461 11354 solver.cpp:219] Iteration 220 (0.473557 iter/s, 42.2335s/20 iters), loss = nan I0321 22:01:18.799545 11354 solver.cpp:238] Train net output #0: dsn2_loss = nan ( 1 = nan loss) I0321 22:01:18.799561 11354 solver.cpp:238] Train net output #1: dsn3_loss = nan ( 1 = nan loss) I0321 22:01:18.799572 11354 solver.cpp:238] Train net output #2: dsn4_loss = nan ( 1 = nan loss) I0321 22:01:18.799583 11354 solver.cpp:238] Train net output #3: dsn5_loss = nan ( 1 = nan loss) I0321 22:01:18.799597 11354 solver.cpp:238] Train net output #4: fuse_loss = nan ( 1 = nan loss) I0321 22:01:18.799616 11354 sgd_solver.cpp:105] Iteration 220, lr = 1e-09 I0321 22:01:59.521841 11354 solver.cpp:219] Iteration 240 (0.491153 iter/s, 40.7205s/20 iters), loss = nan I0321 22:01:59.521919 11354 solver.cpp:238] Train net output #0: dsn2_loss = nan ( 1 = nan loss) I0321 22:01:59.521931 11354 solver.cpp:238] Train net output #1: dsn3_loss = nan ( 1 = nan loss) I0321 22:01:59.521968 11354 solver.cpp:238] Train net output #2: dsn4_loss = nan ( 1 = nan loss) I0321 22:01:59.522012 11354 solver.cpp:238] Train net output #3: dsn5_loss = nan ( 1 = nan loss) I0321 22:01:59.522033 11354 solver.cpp:238] Train net output #4: fuse_loss = nan ( 1 = nan loss) I0321 22:01:59.522073 11354 sgd_solver.cpp:105] Iteration 240, lr = 1e-09 I0321 22:02:42.450592 11354 solver.cpp:219] Iteration 260 (0.465909 iter/s, 42.9268s/20 iters), loss = nan I0321 22:02:42.450680 11354 solver.cpp:238] Train net output #0: dsn2_loss = nan ( 1 = nan loss) I0321 22:02:42.450693 11354 solver.cpp:238] Train net output #1: dsn3_loss = nan ( 1 = nan loss) I0321 22:02:42.450713 11354 solver.cpp:238] Train net output #2: dsn4_loss = nan ( 1 = nan loss) I0321 22:02:42.450726 11354 solver.cpp:238] Train net output #3: dsn5_loss = nan ( 1 = nan loss) I0321 22:02:42.450744 11354 solver.cpp:238] Train net output #4: fuse_loss = nan ( 1 = nan loss) I0321 22:02:42.450754 11354 sgd_solver.cpp:105] Iteration 260, lr = 1e-09 I0321 22:03:24.126384 11354 solver.cpp:219] Iteration 280 (0.479916 iter/s, 41.6739s/20 iters), loss = nan I0321 22:03:24.126468 11354 solver.cpp:238] Train net output #0: dsn2_loss = nan ( 1 = nan loss) I0321 22:03:24.126497 11354 solver.cpp:238] Train net output #1: dsn3_loss = nan ( 1 = nan loss) I0321 22:03:24.126510 11354 solver.cpp:238] Train net output #2: dsn4_loss = nan ( 1 = nan loss) I0321 22:03:24.126528 11354 solver.cpp:238] Train net output #3: dsn5_loss = nan ( 1 = nan loss) I0321 22:03:24.126543 11354 solver.cpp:238] Train net output #4: fuse_loss = nan ( 1 = nan loss) I0321 22:03:24.126565 11354 sgd_solver.cpp:105] Iteration 280, lr = 1e-09 I0321 22:04:02.832152 11354 solver.cpp:219] Iteration 300 (0.516742 iter/s, 38.7041s/20 iters), loss = nan I0321 22:04:02.832226 11354 solver.cpp:238] Train net output #0: dsn2_loss = nan ( 1 = nan loss) I0321 22:04:02.832234 11354 solver.cpp:238] Train net output #1: dsn3_loss = nan ( 1 = nan loss) I0321 22:04:02.832247 11354 solver.cpp:238] Train net output #2: dsn4_loss = nan ( 1 = nan loss) I0321 22:04:02.832253 11354 solver.cpp:238] Train net output #3: dsn5_loss = nan ( 1 = nan loss) I0321 22:04:02.832259 11354 solver.cpp:238] Train net output #4: fuse_loss = nan ( 1 = nan loss) I0321 22:04:02.832267 11354 sgd_solver.cpp:105] Iteration 300, lr = 1e-09 I0321 22:04:43.423032 11354 solver.cpp:219] Iteration 320 (0.492743 iter/s, 40.5891s/20 iters), loss = nan I0321 22:04:43.423207 11354 solver.cpp:238] Train net output #0: dsn2_loss = nan ( 1 = nan loss) I0321 22:04:43.423264 11354 solver.cpp:238] Train net output #1: dsn3_loss = nan ( 1 = nan loss) I0321 22:04:43.423312 11354 solver.cpp:238] Train net output #2: dsn4_loss = nan ( 1 = nan loss) I0321 22:04:43.423359 11354 solver.cpp:238] Train net output #3: dsn5_loss = nan ( 1 = nan loss) I0321 22:04:43.423405 11354 solver.cpp:238] Train net output #4: fuse_loss = nan ( 1 = nan loss) I0321 22:04:43.423454 11354 sgd_solver.cpp:105] Iteration 320, lr = 1e-09 I0321 22:05:22.978026 11354 solver.cpp:219] Iteration 340 (0.505648 iter/s, 39.5532s/20 iters), loss = nan I0321 22:05:22.978121 11354 solver.cpp:238] Train net output #0: dsn2_loss = nan ( 1 = nan loss) I0321 22:05:22.978135 11354 solver.cpp:238] Train net output #1: dsn3_loss = nan ( 1 = nan loss) I0321 22:05:22.978149 11354 solver.cpp:238] Train net output #2: dsn4_loss = nan ( 1 = nan loss) I0321 22:05:22.978162 11354 solver.cpp:238] Train net output #3: dsn5_loss = nan ( 1 = nan loss) I0321 22:05:22.978175 11354 solver.cpp:238] Train net output #4: fuse_loss = nan ( 1 = nan loss) I0321 22:05:22.978186 11354 sgd_solver.cpp:105] Iteration 340, lr = 1e-09 I0321 22:06:02.740044 11354 solver.cpp:219] Iteration 360 (0.503014 iter/s, 39.7603s/20 iters), loss = nan I0321 22:06:02.740128 11354 solver.cpp:238] Train net output #0: dsn2_loss = nan ( 1 = nan loss) I0321 22:06:02.740154 11354 solver.cpp:238] Train net output #1: dsn3_loss = nan ( 1 = nan loss) I0321 22:06:02.740160 11354 solver.cpp:238] Train net output #2: dsn4_loss = nan ( 1 = nan loss) I0321 22:06:02.740169 11354 solver.cpp:238] Train net output #3: dsn5_loss = nan ( 1 = nan loss) I0321 22:06:02.740175 11354 solver.cpp:238] Train net output #4: fuse_loss = nan (* 1 = nan loss)