tensorflow / models

Models and examples built with TensorFlow
Other
77.01k stars 45.78k forks source link

Why do I use deeplab to train my data set, and the loss decreases to 0.3 and then increases slowly? #8185

Open Wahaha1314 opened 4 years ago

Wahaha1314 commented 4 years ago

Linux Ubuntu 18.

Wahaha1314 commented 4 years ago

I0225 00:28:37.298827 139678134472832 learning.py:507] global step 9350: loss = 0.3473 (0.546 sec/step) INFO:tensorflow:global step 9360: loss = 0.3541 (0.460 sec/step) I0225 00:28:42.516359 139678134472832 learning.py:507] global step 9360: loss = 0.3541 (0.460 sec/step) INFO:tensorflow:global step 9370: loss = 0.3571 (0.693 sec/step) I0225 00:28:48.227591 139678134472832 learning.py:507] global step 9370: loss = 0.3571 (0.693 sec/step) INFO:tensorflow:global step 9380: loss = 0.3720 (0.488 sec/step) I0225 00:28:53.192583 139678134472832 learning.py:507] global step 9380: loss = 0.3720 (0.488 sec/step) INFO:tensorflow:global step 9390: loss = 0.3578 (0.595 sec/step) I0225 00:28:58.983253 139678134472832 learning.py:507] global step 9390: loss = 0.3578 (0.595 sec/step) INFO:tensorflow:global step 9400: loss = 0.3607 (0.638 sec/step) I0225 00:29:04.929780 139678134472832 learning.py:507] global step 9400: loss = 0.3607 (0.638 sec/step) INFO:tensorflow:global step 9410: loss = 0.3563 (0.473 sec/step) I0225 00:29:10.304232 139678134472832 learning.py:507] global step 9410: loss = 0.3563 (0.473 sec/step) INFO:tensorflow:global step 9420: loss = 0.3570 (0.759 sec/step) I0225 00:29:15.999921 139678134472832 learning.py:507] global step 9420: loss = 0.3570 (0.759 sec/step) INFO:tensorflow:global step 9430: loss = 0.3619 (0.543 sec/step) I0225 00:29:21.594344 139678134472832 learning.py:507] global step 9430: loss = 0.3619 (0.543 sec/step) INFO:tensorflow:global step 9440: loss = 0.3595 (0.578 sec/step) I0225 00:29:27.436398 139678134472832 learning.py:507] global step 9440: loss = 0.3595 (0.578 sec/step) INFO:tensorflow:global step 9450: loss = 0.3662 (0.474 sec/step) I0225 00:29:33.191127 139678134472832 learning.py:507] global step 9450: loss = 0.3662 (0.474 sec/step) INFO:tensorflow:global step 9460: loss = 0.3544 (0.627 sec/step) I0225 00:29:38.640579 139678134472832 learning.py:507] global step 9460: loss = 0.3544 (0.627 sec/step) INFO:tensorflow:global step 9470: loss = 0.3534 (0.524 sec/step) I0225 00:29:44.139233 139678134472832 learning.py:507] global step 9470: loss = 0.3534 (0.524 sec/step) INFO:tensorflow:global step 9480: loss = 0.3647 (0.463 sec/step) I0225 00:29:49.347350 139678134472832 learning.py:507] global step 9480: loss = 0.3647 (0.463 sec/step) INFO:tensorflow:global step 9490: loss = 0.3676 (0.429 sec/step) I0225 00:29:54.579720 139678134472832 learning.py:507] global step 9490: loss = 0.3676 (0.429 sec/step) INFO:tensorflow:global step 9500: loss = 0.3806 (0.652 sec/step) I0225 00:30:00.306306 139678134472832 learning.py:507] global step 9500: loss = 0.3806 (0.652 sec/step) INFO:tensorflow:global step 9510: loss = 0.3576 (0.518 sec/step) I0225 00:30:06.049466 139678134472832 learning.py:507] global step 9510: loss = 0.3576 (0.518 sec/step) INFO:tensorflow:global step 9520: loss = 0.3602 (0.625 sec/step) I0225 00:30:11.617736 139678134472832 learning.py:507] global step 9520: loss = 0.3602 (0.625 sec/step) INFO:tensorflow:global step 9530: loss = 0.3583 (0.608 sec/step) I0225 00:30:16.928484 139678134472832 learning.py:507] global step 9530: loss = 0.3583 (0.608 sec/step) INFO:tensorflow:global step 9540: loss = 0.3657 (0.433 sec/step) I0225 00:30:22.411167 139678134472832 learning.py:507] global step 9540: loss = 0.3657 (0.433 sec/step) INFO:tensorflow:global step 9550: loss = 0.3619 (0.566 sec/step) I0225 00:30:28.002476 139678134472832 learning.py:507] global step 9550: loss = 0.3619 (0.566 sec/step) INFO:tensorflow:global step 9560: loss = 0.3714 (0.560 sec/step) I0225 00:30:33.478770 139678134472832 learning.py:507] global step 9560: loss = 0.3714 (0.560 sec/step) INFO:tensorflow:global step 9570: loss = 0.3633 (0.554 sec/step) I0225 00:30:39.196351 139678134472832 learning.py:507] global step 9570: loss = 0.3633 (0.554 sec/step) INFO:tensorflow:Recording summary at step 9576. I0225 00:30:42.872439 139657530550016 supervisor.py:1050] Recording summary at step 9576. INFO:tensorflow:global_step/sec: 1.78333 I0225 00:30:43.460937 139657522157312 supervisor.py:1099] global_step/sec: 1.78333 INFO:tensorflow:global step 9580: loss = 0.3659 (0.517 sec/step) I0225 00:30:45.346922 139678134472832 learning.py:507] global step 9580: loss = 0.3659 (0.517 sec/step) INFO:tensorflow:global step 9590: loss = 0.3702 (0.505 sec/step) I0225 00:30:51.419602 139678134472832 learning.py:507] global step 9590: loss = 0.3702 (0.505 sec/step) INFO:tensorflow:global step 9600: loss = 0.3700 (0.563 sec/step) I0225 00:30:56.965019 139678134472832 learning.py:507] global step 9600: loss = 0.3700 (0.563 sec/step) INFO:tensorflow:global step 9610: loss = 0.3612 (0.485 sec/step) I0225 00:31:02.442407 139678134472832 learning.py:507] global step 9610: loss = 0.3612 (0.485 sec/step) INFO:tensorflow:global step 9620: loss = 0.3666 (0.503 sec/step) I0225 00:31:08.114673 139678134472832 learning.py:507] global step 9620: loss = 0.3666 (0.503 sec/step) INFO:tensorflow:global step 9630: loss = 0.3634 (0.548 sec/step) I0225 00:31:13.751699 139678134472832 learning.py:507] global step 9630: loss = 0.3634 (0.548 sec/step) INFO:tensorflow:global step 9640: loss = 0.3752 (0.459 sec/step) I0225 00:31:19.354572 139678134472832 learning.py:507] global step 9640: loss = 0.3752 (0.459 sec/step) INFO:tensorflow:global step 9650: loss = 0.3669 (0.498 sec/step) I0225 00:31:24.641446 139678134472832 learning.py:507] global step 9650: loss = 0.3669 (0.498 sec/step) INFO:tensorflow:global step 9660: loss = 0.3649 (0.599 sec/step) I0225 00:31:29.692257 139678134472832 learning.py:507] global step 9660: loss = 0.3649 (0.599 sec/step) INFO:tensorflow:global step 9670: loss = 0.3723 (0.476 sec/step) I0225 00:31:35.437648 139678134472832 learning.py:507] global step 9670: loss = 0.3723 (0.476 sec/step) INFO:tensorflow:global step 9680: loss = 0.3652 (0.595 sec/step) I0225 00:31:41.270802 139678134472832 learning.py:507] global step 9680: loss = 0.3652 (0.595 sec/step) INFO:tensorflow:global step 9690: loss = 0.3703 (0.477 sec/step) I0225 00:31:46.541732 139678134472832 learning.py:507] global step 9690: loss = 0.3703 (0.477 sec/step) INFO:tensorflow:global step 9700: loss = 0.3724 (0.452 sec/step) I0225 00:31:51.974944 139678134472832 learning.py:507] global step 9700: loss = 0.3724 (0.452 sec/step) INFO:tensorflow:global step 9710: loss = 0.3726 (0.491 sec/step) I0225 00:31:57.275008 139678134472832 learning.py:507] global step 9710: loss = 0.3726 (0.491 sec/step) INFO:tensorflow:global step 9720: loss = 0.3715 (0.457 sec/step) I0225 00:32:03.120554 139678134472832 learning.py:507] global step 9720: loss = 0.3715 (0.457 sec/step) INFO:tensorflow:global step 9730: loss = 0.3794 (0.587 sec/step) I0225 00:32:09.058451 139678134472832 learning.py:507] global step 9730: loss = 0.3794 (0.587 sec/step) INFO:tensorflow:global step 9740: loss = 0.3865 (0.524 sec/step) I0225 00:32:14.881457 139678134472832 learning.py:507] global step 9740: loss = 0.3865 (0.524 sec/step) INFO:tensorflow:global step 9750: loss = 0.3708 (0.528 sec/step) I0225 00:32:20.310225 139678134472832 learning.py:507] global step 9750: loss = 0.3708 (0.528 sec/step) INFO:tensorflow:global step 9760: loss = 0.3684 (0.459 sec/step) I0225 00:32:25.635300 139678134472832 learning.py:507] global step 9760: loss = 0.3684 (0.459 sec/step) INFO:tensorflow:global step 9770: loss = 0.3661 (0.548 sec/step) I0225 00:32:31.210130 139678134472832 learning.py:507] global step 9770: loss = 0.3661 (0.548 sec/step) INFO:tensorflow:global step 9780: loss = 0.3860 (0.598 sec/step) I0225 00:32:36.758284 139678134472832 learning.py:507] global step 9780: loss = 0.3860 (0.598 sec/step) INFO:tensorflow:global step 9790: loss = 0.3765 (0.554 sec/step) I0225 00:32:42.698163 139678134472832 learning.py:507] global step 9790: loss = 0.3765 (0.554 sec/step) INFO:tensorflow:global step 9800: loss = 0.3707 (0.618 sec/step) I0225 00:32:48.358773 139678134472832 learning.py:507] global step 9800: loss = 0.3707 (0.618 sec/step) INFO:tensorflow:global step 9810: loss = 0.3738 (0.604 sec/step) I0225 00:32:53.659318 139678134472832 learning.py:507] global step 9810: loss = 0.3738 (0.604 sec/step) INFO:tensorflow:global step 9820: loss = 0.3711 (0.626 sec/step) I0225 00:32:59.506811 139678134472832 learning.py:507] global step 9820: loss = 0.3711 (0.626 sec/step) INFO:tensorflow:global step 9830: loss = 0.3789 (0.620 sec/step) I0225 00:33:04.974010 139678134472832 learning.py:507] global step 9830: loss = 0.3789 (0.620 sec/step) INFO:tensorflow:global step 9840: loss = 0.3782 (0.496 sec/step) I0225 00:33:10.766940 139678134472832 learning.py:507] global step 9840: loss = 0.3782 (0.496 sec/step) INFO:tensorflow:global step 9850: loss = 0.3757 (0.622 sec/step) I0225 00:33:16.375857 139678134472832 learning.py:507] global step 9850: loss = 0.3757 (0.622 sec/step) INFO:tensorflow:global step 9860: loss = 0.3742 (0.523 sec/step) I0225 00:33:22.007895 139678134472832 learning.py:507] global step 9860: loss = 0.3742 (0.523 sec/step) INFO:tensorflow:global step 9870: loss = 0.3775 (0.584 sec/step) I0225 00:33:27.588703 139678134472832 learning.py:507] global step 9870: loss = 0.3775 (0.584 sec/step) INFO:tensorflow:global step 9880: loss = 0.3759 (0.525 sec/step) I0225 00:33:33.040072 139678134472832 learning.py:507] global step 9880: loss = 0.3759 (0.525 sec/step) INFO:tensorflow:global step 9890: loss = 0.3826 (0.559 sec/step) I0225 00:33:38.070516 139678134472832 learning.py:507] global step 9890: loss = 0.3826 (0.559 sec/step) INFO:tensorflow:global step 9900: loss = 0.3796 (0.578 sec/step) I0225 00:33:43.958830 139678134472832 learning.py:507] global step 9900: loss = 0.3796 (0.578 sec/step) INFO:tensorflow:global step 9910: loss = 0.3736 (0.558 sec/step) I0225 00:33:49.135828 139678134472832 learning.py:507] global step 9910: loss = 0.3736 (0.558 sec/step) INFO:tensorflow:global step 9920: loss = 0.3821 (0.518 sec/step) I0225 00:33:54.867533 139678134472832 learning.py:507] global step 9920: loss = 0.3821 (0.518 sec/step) INFO:tensorflow:global step 9930: loss = 0.3845 (0.532 sec/step) I0225 00:34:00.631675 139678134472832 learning.py:507] global step 9930: loss = 0.3845 (0.532 sec/step) INFO:tensorflow:global step 9940: loss = 0.3810 (0.457 sec/step) I0225 00:34:06.499222 139678134472832 learning.py:507] global step 9940: loss = 0.3810 (0.457 sec/step) INFO:tensorflow:global step 9950: loss = 0.3739 (0.575 sec/step) I0225 00:34:11.851280 139678134472832 learning.py:507] global step 9950: loss = 0.3739 (0.575 sec/step) INFO:tensorflow:global step 9960: loss = 0.3800 (0.679 sec/step) I0225 00:34:17.668738 139678134472832 learning.py:507] global step 9960: loss = 0.3800 (0.679 sec/step) INFO:tensorflow:global step 9970: loss = 0.3964 (0.460 sec/step) I0225 00:34:23.449271 139678134472832 learning.py:507] global step 9970: loss = 0.3964 (0.460 sec/step) INFO:tensorflow:global step 9980: loss = 0.3809 (0.524 sec/step) I0225 00:34:29.103214 139678134472832 learning.py:507] global step 9980: loss = 0.3809 (0.524 sec/step) INFO:tensorflow:global step 9990: loss = 0.3755 (0.680 sec/step) I0225 00:34:34.704662 139678134472832 learning.py:507] global step 9990: loss = 0.3755 (0.680 sec/step) INFO:tensorflow:global step 10000: loss = 0.3854 (0.581 sec/step) I0225 00:34:40.670568 139678134472832 learning.py:507] global step 10000: loss = 0.3854 (0.581 sec/step) INFO:tensorflow:global step 10010: loss = 0.3965 (0.590 sec/step) I0225 00:34:46.253268 139678134472832 learning.py:507] global step 10010: loss = 0.3965 (0.590 sec/step) INFO:tensorflow:global step 10020: loss = 0.3850 (0.473 sec/step) I0225 00:34:51.972010 139678134472832 learning.py:507] global step 10020: loss = 0.3850 (0.473 sec/step) INFO:tensorflow:global step 10030: loss = 0.3828 (0.492 sec/step) I0225 00:34:57.387913 139678134472832 learning.py:507] global step 10030: loss = 0.3828 (0.492 sec/step) INFO:tensorflow:global step 10040: loss = 0.3886 (0.519 sec/step) I0225 00:35:02.878106 139678134472832 learning.py:507] global step 10040: loss = 0.3886 (0.519 sec/step) INFO:tensorflow:global step 10050: loss = 0.3971 (0.552 sec/step) I0225 00:35:08.303399 139678134472832 learning.py:507] global step 10050: loss = 0.3971 (0.552 sec/step) INFO:tensorflow:global step 10060: loss = 0.3886 (0.687 sec/step) I0225 00:35:14.418859 139678134472832 learning.py:507] global step 10060: loss = 0.3886 (0.687 sec/step) INFO:tensorflow:global step 10070: loss = 0.3866 (0.683 sec/step) I0225 00:35:20.070899 139678134472832 learning.py:507] global step 10070: loss = 0.3866 (0.683 sec/step) INFO:tensorflow:global step 10080: loss = 0.3930 (0.597 sec/step) I0225 00:35:25.840482 139678134472832 learning.py:507] global step 10080: loss = 0.3930 (0.597 sec/step) INFO:tensorflow:global step 10090: loss = 0.3919 (0.493 sec/step) I0225 00:35:31.448308 139678134472832 learning.py:507] global step 10090: loss = 0.3919 (0.493 sec/step) INFO:tensorflow:global step 10100: loss = 0.3969 (0.572 sec/step) I0225 00:35:37.105967 139678134472832 learning.py:507] global step 10100: loss = 0.3969 (0.572 sec/step) INFO:tensorflow:global step 10110: loss = 0.3935 (0.650 sec/step) I0225 00:35:42.791347 139678134472832 learning.py:507] global step 10110: loss = 0.3935 (0.650 sec/step) INFO:tensorflow:global step 10120: loss = 0.3943 (0.532 sec/step) I0225 00:35:48.230713 139678134472832 learning.py:507] global step 10120: loss = 0.3943 (0.532 sec/step) INFO:tensorflow:global step 10130: loss = 0.3856 (0.602 sec/step) I0225 00:35:53.764667 139678134472832 learning.py:507] global step 10130: loss = 0.3856 (0.602 sec/step) INFO:tensorflow:global step 10140: loss = 0.3951 (0.476 sec/step) I0225 00:35:59.472116 139678134472832 learning.py:507] global step 10140: loss = 0.3951 (0.476 sec/step) INFO:tensorflow:global step 10150: loss = 0.3910 (0.560 sec/step) I0225 00:36:04.806911 139678134472832 learning.py:507] global step 10150: loss = 0.3910 (0.560 sec/step) INFO:tensorflow:global step 10160: loss = 0.3889 (0.506 sec/step) I0225 00:36:10.391203 139678134472832 learning.py:507] global step 10160: loss = 0.3889 (0.506 sec/step) INFO:tensorflow:global step 10170: loss = 0.3985 (0.537 sec/step) I0225 00:36:15.967667 139678134472832 learning.py:507] global step 10170: loss = 0.3985 (0.537 sec/step) INFO:tensorflow:global step 10180: loss = 0.3948 (0.568 sec/step) I0225 00:36:21.401251 139678134472832 learning.py:507] global step 10180: loss = 0.3948 (0.568 sec/step) INFO:tensorflow:global step 10190: loss = 0.4117 (0.505 sec/step) I0225 00:36:27.015630 139678134472832 learning.py:507] global step 10190: loss = 0.4117 (0.505 sec/step) INFO:tensorflow:global step 10200: loss = 0.3910 (0.615 sec/step) I0225 00:36:32.646471 139678134472832 learning.py:507] global step 10200: loss = 0.3910 (0.615 sec/step) INFO:tensorflow:global step 10210: loss = 0.3838 (0.608 sec/step)

I've only clipped a little bit of the way back up, up 1.4. I used the official pre-training weights when training the model.The bitchsize is 16, with 30,000 iterations

YknZhu commented 4 years ago

Here the loss reported should be training loss on each batch? Given the images feed into each batch is different, this metric could be noisy. Potentially a better measure would be mIOU measured on a hold-out set.