tensorflow / models

Models and examples built with TensorFlow
Other
77.24k stars 45.75k forks source link

Exploding loss after few iterations of training of Faster RCNN ResNet50 #8423

Closed Boltuzamaki closed 4 years ago

Boltuzamaki commented 4 years ago

System information What is the top-level directory of the model you are using: object_detection

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): use train.py script on my own dataset

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Using google collab

TensorFlow installed from (source or binary): changed to %tensorflow_version 1.x (in collab)

TensorFlow version (use command below): 1.15.2

CUDA/cuDNN version: Using google collab

GPU model and memory: 12GB NVIDIA Tesla K80 GP(I guess as its google collab)

I am using faster_rcnn_resnet50_coco and facing problem of exploding loss after few iteration of training

I am training on only one class but gradient is exploding after few iterations exponentially please help. I am using dataset Penn-Fudan Database for Pedestrian Detection

Following is my config file

model { faster_rcnn { num_classes: 1 image_resizer { keep_aspect_ratio_resizer { min_dimension: 400 max_dimension: 600 } } feature_extractor { type: 'faster_rcnn_resnet50' first_stage_features_stride: 16 } first_stage_anchor_generator { grid_anchor_generator { scales: [0.25, 0.5, 1.0, 2.0] aspect_ratios: [0.5, 1.0, 2.0] height_stride: 16 width_stride: 16 } } first_stage_box_predictor_conv_hyperparams { op: CONV regularizer { l2_regularizer { weight: 0.0 } } initializer { truncated_normal_initializer { stddev: 0.01 } } } first_stage_nms_score_threshold: 0.0 first_stage_nms_iou_threshold: 0.7 first_stage_max_proposals: 300 first_stage_localization_loss_weight: 2.0 first_stage_objectness_loss_weight: 1.0 initial_crop_size: 14 maxpool_kernel_size: 2 maxpool_stride: 2 second_stage_box_predictor { mask_rcnn_box_predictor { use_dropout: false dropout_keep_probability: 1.0 fc_hyperparams { op: FC regularizer { l2_regularizer { weight: 0.0 } } initializer { variance_scaling_initializer { factor: 1.0 uniform: true mode: FAN_AVG } } } } } second_stage_post_processing { batch_non_max_suppression { score_threshold: 0.0 iou_threshold: 0.6 max_detections_per_class: 100 max_total_detections: 300 } score_converter: SIGMOID } second_stage_localization_loss_weight: 2.0 second_stage_classification_loss_weight: 1.0 } }

train_config: { batch_size: 1 optimizer { momentum_optimizer: { learning_rate: { manual_step_learning_rate { initial_learning_rate: 0.0001 schedule { step: 900000 learning_rate: .000001 } schedule { step: 1200000 learning_rate: .000001 } } } momentum_optimizer_value: 0.9 } use_moving_average: false } gradient_clipping_by_norm: 5.0 fine_tune_checkpoint: "/content/drive/My Drive/Tensorflow/models/faster_rcnn_resnet50_coco_2018_01_28/model.ckpt" from_detection_checkpoint: true

num_steps: 200000 data_augmentation_options { random_horizontal_flip { } } }

train_input_reader: { tf_record_input_reader { input_path: "/content/drive/My Drive/Tensorflow/models/train.record" } label_map_path: "/content/drive/My Drive/Tensorflow/models/training/labelmap.pbtxt" }

eval_config: { num_examples: 36

max_evals: 10 }

eval_input_reader: { tf_record_input_reader { input_path: "/content/drive/My Drive/Tensorflow/models/test.record" } label_map_path: "/content/drive/My Drive/Tensorflow/models/training/labelmap.pbtxt" shuffle: false num_readers: 1 }

My training log

I0422 07:09:55.320835 140486560917376 learning.py:507] global step 374: loss = 0.2636 (0.294 sec/step) INFO:tensorflow:global step 375: loss = 0.6618 (0.306 sec/step) I0422 07:09:55.628338 140486560917376 learning.py:507] global step 375: loss = 0.6618 (0.306 sec/step) INFO:tensorflow:global step 376: loss = 0.3663 (0.289 sec/step) I0422 07:09:55.919772 140486560917376 learning.py:507] global step 376: loss = 0.3663 (0.289 sec/step) INFO:tensorflow:global step 377: loss = 0.2678 (0.292 sec/step) I0422 07:09:56.213927 140486560917376 learning.py:507] global step 377: loss = 0.2678 (0.292 sec/step) INFO:tensorflow:global step 378: loss = 0.3992 (0.293 sec/step) I0422 07:09:56.509216 140486560917376 learning.py:507] global step 378: loss = 0.3992 (0.293 sec/step) INFO:tensorflow:global step 379: loss = 0.4918 (0.264 sec/step) I0422 07:09:56.775810 140486560917376 learning.py:507] global step 379: loss = 0.4918 (0.264 sec/step) INFO:tensorflow:global step 380: loss = 0.2143 (0.269 sec/step) I0422 07:09:57.046996 140486560917376 learning.py:507] global step 380: loss = 0.2143 (0.269 sec/step) INFO:tensorflow:global step 381: loss = 1.0149 (0.273 sec/step) I0422 07:09:57.321473 140486560917376 learning.py:507] global step 381: loss = 1.0149 (0.273 sec/step) INFO:tensorflow:global step 382: loss = 0.2884 (0.262 sec/step) I0422 07:09:57.585358 140486560917376 learning.py:507] global step 382: loss = 0.2884 (0.262 sec/step) INFO:tensorflow:global step 383: loss = 0.3797 (0.289 sec/step) I0422 07:09:57.876454 140486560917376 learning.py:507] global step 383: loss = 0.3797 (0.289 sec/step) INFO:tensorflow:global step 384: loss = 0.6466 (0.306 sec/step) I0422 07:09:58.183916 140486560917376 learning.py:507] global step 384: loss = 0.6466 (0.306 sec/step) INFO:tensorflow:global step 385: loss = 0.0887 (0.271 sec/step) I0422 07:09:58.457142 140486560917376 learning.py:507] global step 385: loss = 0.0887 (0.271 sec/step) INFO:tensorflow:global step 386: loss = 0.7238 (0.297 sec/step) I0422 07:09:58.756018 140486560917376 learning.py:507] global step 386: loss = 0.7238 (0.297 sec/step) INFO:tensorflow:global step 387: loss = 0.8665 (0.314 sec/step) I0422 07:09:59.072811 140486560917376 learning.py:507] global step 387: loss = 0.8665 (0.314 sec/step) INFO:tensorflow:global step 388: loss = 1.0504 (0.292 sec/step) I0422 07:09:59.366814 140486560917376 learning.py:507] global step 388: loss = 1.0504 (0.292 sec/step) INFO:tensorflow:global step 389: loss = 1.0150 (0.290 sec/step) I0422 07:09:59.661545 140486560917376 learning.py:507] global step 389: loss = 1.0150 (0.290 sec/step) INFO:tensorflow:global step 390: loss = 0.6820 (0.284 sec/step) I0422 07:09:59.947800 140486560917376 learning.py:507] global step 390: loss = 0.6820 (0.284 sec/step) INFO:tensorflow:global step 391: loss = 1.7842 (0.266 sec/step) I0422 07:10:00.215992 140486560917376 learning.py:507] global step 391: loss = 1.7842 (0.266 sec/step) INFO:tensorflow:global step 392: loss = 5.3590 (0.257 sec/step) I0422 07:10:00.474348 140486560917376 learning.py:507] global step 392: loss = 5.3590 (0.257 sec/step) INFO:tensorflow:global step 393: loss = 5.0548 (0.304 sec/step) I0422 07:10:00.780560 140486560917376 learning.py:507] global step 393: loss = 5.0548 (0.304 sec/step) INFO:tensorflow:global step 394: loss = 7.1372 (0.302 sec/step) I0422 07:10:01.084518 140486560917376 learning.py:507] global step 394: loss = 7.1372 (0.302 sec/step) INFO:tensorflow:global step 395: loss = 0.3071 (0.259 sec/step) I0422 07:10:01.345200 140486560917376 learning.py:507] global step 395: loss = 0.3071 (0.259 sec/step) INFO:tensorflow:global step 396: loss = 33.3896 (0.267 sec/step) I0422 07:10:01.613836 140486560917376 learning.py:507] global step 396: loss = 33.3896 (0.267 sec/step) INFO:tensorflow:global step 397: loss = 0.1341 (0.285 sec/step) I0422 07:10:01.900630 140486560917376 learning.py:507] global step 397: loss = 0.1341 (0.285 sec/step) INFO:tensorflow:global step 398: loss = 20.5861 (0.252 sec/step) I0422 07:10:02.154454 140486560917376 learning.py:507] global step 398: loss = 20.5861 (0.252 sec/step) INFO:tensorflow:global step 399: loss = 0.2996 (0.260 sec/step) I0422 07:10:02.416572 140486560917376 learning.py:507] global step 399: loss = 0.2996 (0.260 sec/step) INFO:tensorflow:global step 400: loss = 53.6560 (0.276 sec/step) I0422 07:10:02.693947 140486560917376 learning.py:507] global step 400: loss = 53.6560 (0.276 sec/step) INFO:tensorflow:global step 401: loss = 354.2955 (0.303 sec/step) I0422 07:10:02.998691 140486560917376 learning.py:507] global step 401: loss = 354.2955 (0.303 sec/step) INFO:tensorflow:global step 402: loss = 138.3654 (0.270 sec/step) I0422 07:10:03.271044 140486560917376 learning.py:507] global step 402: loss = 138.3654 (0.270 sec/step) INFO:tensorflow:global step 403: loss = 81.3835 (0.292 sec/step) I0422 07:10:03.565308 140486560917376 learning.py:507] global step 403: loss = 81.3835 (0.292 sec/step) INFO:tensorflow:global step 404: loss = 2043.4795 (0.282 sec/step) I0422 07:10:03.848872 140486560917376 learning.py:507] global step 404: loss = 2043.4795 (0.282 sec/step) INFO:tensorflow:global step 405: loss = 0.0435 (0.297 sec/step) I0422 07:10:04.148034 140486560917376 learning.py:507] global step 405: loss = 0.0435 (0.297 sec/step) INFO:tensorflow:global step 406: loss = 7290.5103 (0.267 sec/step) I0422 07:10:04.417101 140486560917376 learning.py:507] global step 406: loss = 7290.5103 (0.267 sec/step) INFO:tensorflow:global step 407: loss = 7957.7422 (0.261 sec/step) I0422 07:10:04.680258 140486560917376 learning.py:507] global step 407: loss = 7957.7422 (0.261 sec/step) INFO:tensorflow:global step 408: loss = 0.1442 (0.302 sec/step) I0422 07:10:04.984018 140486560917376 learning.py:507] global step 408: loss = 0.1442 (0.302 sec/step) INFO:tensorflow:global step 409: loss = 25237.8984 (0.273 sec/step) I0422 07:10:05.258542 140486560917376 learning.py:507] global step 409: loss = 25237.8984 (0.273 sec/step) INFO:tensorflow:global step 410: loss = 75835.2812 (0.319 sec/step) I0422 07:10:05.579621 140486560917376 learning.py:507] global step 410: loss = 75835.2812 (0.319 sec/step) INFO:tensorflow:global step 411: loss = 28575.1914 (0.250 sec/step) I0422 07:10:05.832293 140486560917376 learning.py:507] global step 411: loss = 28575.1914 (0.250 sec/step) INFO:tensorflow:global step 412: loss = 134869.8906 (0.293 sec/step) I0422 07:10:06.129227 140486560917376 learning.py:507] global step 412: loss = 134869.8906 (0.293 sec/step) INFO:tensorflow:global step 413: loss = 437442.4062 (0.296 sec/step) I0422 07:10:06.427104 140486560917376 learning.py:507] global step 413: loss = 437442.4062 (0.296 sec/step) INFO:tensorflow:global step 414: loss = 212268.4531 (0.255 sec/step) I0422 07:10:06.684252 140486560917376 learning.py:507] global step 414: loss = 212268.4531 (0.255 sec/step) INFO:tensorflow:global step 415: loss = 1216893.1250 (0.276 sec/step) I0422 07:10:06.961721 140486560917376 learning.py:507] global step 415: loss = 1216893.1250 (0.276 sec/step) INFO:tensorflow:global step 416: loss = 0.1749 (0.262 sec/step) I0422 07:10:07.225651 140486560917376 learning.py:507] global step 416: loss = 0.1749 (0.262 sec/step) INFO:tensorflow:global step 417: loss = 2736256.2500 (0.312 sec/step) I0422 07:10:07.539854 140486560917376 learning.py:507] global step 417: loss = 2736256.2500 (0.312 sec/step) INFO:tensorflow:global step 418: loss = 4241052.0000 (0.263 sec/step) I0422 07:10:07.805094 140486560917376 learning.py:507] global step 418: loss = 4241052.0000 (0.263 sec/step) INFO:tensorflow:global step 419: loss = 4462876.0000 (0.266 sec/step) I0422 07:10:08.073152 140486560917376 learning.py:507] global step 419: loss = 4462876.0000 (0.266 sec/step) INFO:tensorflow:global step 420: loss = 18808836.0000 (0.295 sec/step) I0422 07:10:08.370062 140486560917376 learning.py:507] global step 420: loss = 18808836.0000 (0.295 sec/step) INFO:tensorflow:global step 421: loss = 96460304.0000 (0.288 sec/step) I0422 07:10:08.660426 140486560917376 learning.py:507] global step 421: loss = 96460304.0000 (0.288 sec/step) INFO:tensorflow:global step 422: loss = 85134320.0000 (0.320 sec/step) I0422 07:10:08.982865 140486560917376 learning.py:507] global step 422: loss = 85134320.0000 (0.320 sec/step) INFO:tensorflow:global step 423: loss = 364593632.0000 (0.257 sec/step) I0422 07:10:09.241693 140486560917376 learning.py:507] global step 423: loss = 364593632.0000 (0.257 sec/step) INFO:tensorflow:global step 424: loss = 159115248.0000 (0.267 sec/step) I0422 07:10:09.510233 140486560917376 learning.py:507] global step 424: loss = 159115248.0000 (0.267 sec/step) INFO:tensorflow:global step 425: loss = 854715264.0000 (0.312 sec/step) I0422 07:10:09.823988 140486560917376 learning.py:507] global step 425: loss = 854715264.0000 (0.312 sec/step) INFO:tensorflow:global step 426: loss = 3067453952.0000 (0.296 sec/step) I0422 07:10:10.121925 140486560917376 learning.py:507] global step 426: loss = 3067453952.0000 (0.296 sec/step) INFO:tensorflow:global step 427: loss = 3518234624.0000 (0.291 sec/step) I0422 07:10:10.414811 140486560917376 learning.py:507] global step 427: loss = 3518234624.0000 (0.291 sec/step) INFO:tensorflow:global step 428: loss = 17210691584.0000 (0.327 sec/step) I0422 07:10:10.743706 140486560917376 learning.py:507] global step 428: loss = 17210691584.0000 (0.327 sec/step) INFO:tensorflow:global step 429: loss = 22827235328.0000 (0.298 sec/step) I0422 07:10:11.043578 140486560917376 learning.py:507] global step 429: loss = 22827235328.0000 (0.298 sec/step) INFO:tensorflow:global step 430: loss = 99799859200.0000 (0.263 sec/step) I0422 07:10:11.308007 140486560917376 learning.py:507] global step 430: loss = 99799859200.0000 (0.263 sec/step) INFO:tensorflow:global step 431: loss = 0.7569 (0.287 sec/step) I0422 07:10:11.596587 140486560917376 learning.py:507] global step 431: loss = 0.7569 (0.287 sec/step) INFO:tensorflow:global step 432: loss = 164616962048.0000 (0.323 sec/step) I0422 07:10:11.922135 140486560917376 learning.py:507] global step 432: loss = 164616962048.0000 (0.323 sec/step) INFO:tensorflow:global step 433: loss = 598838804480.0000 (0.267 sec/step) I0422 07:10:12.191077 140486560917376 learning.py:507] global step 433: loss = 598838804480.0000 (0.267 sec/step) INFO:tensorflow:global step 434: loss = 171039686656.0000 (0.285 sec/step) I0422 07:10:12.478295 140486560917376 learning.py:507] global step 434: loss = 171039686656.0000 (0.285 sec/step) INFO:tensorflow:global step 435: loss = 0.1586 (0.294 sec/step) I0422 07:10:12.774455 140486560917376 learning.py:507] global step 435: loss = 0.1586 (0.294 sec/step) INFO:tensorflow:global step 436: loss = 11961404227584.0000 (0.264 sec/step) I0422 07:10:13.040502 140486560917376 learning.py:507] global step 436: loss = 11961404227584.0000 (0.264 sec/step) INFO:tensorflow:global step 437: loss = 10615577903104.0000 (0.297 sec/step) I0422 07:10:13.339689 140486560917376 learning.py:507] global step 437: loss = 10615577903104.0000 (0.297 sec/step) INFO:tensorflow:global step 438: loss = 6634327769088.0000 (0.262 sec/step) I0422 07:10:13.603152 140486560917376 learning.py:507] global step 438: loss = 6634327769088.0000 (0.262 sec/step) INFO:tensorflow:global step 439: loss = 0.0360 (0.265 sec/step) I0422 07:10:13.870558 140486560917376 learning.py:507] global step 439: loss = 0.0360 (0.265 sec/step) INFO:tensorflow:global step 440: loss = 15168851410944.0000 (0.312 sec/step) I0422 07:10:14.184696 140486560917376 learning.py:507] global step 440: loss = 15168851410944.0000 (0.312 sec/step) INFO:tensorflow:global step 441: loss = 0.3786 (0.265 sec/step) I0422 07:10:14.451148 140486560917376 learning.py:507] global step 441: loss = 0.3786 (0.265 sec/step) INFO:tensorflow:global step 442: loss = 204700758573056.0000 (0.258 sec/step) I0422 07:10:14.711573 140486560917376 learning.py:507] global step 442: loss = 204700758573056.0000 (0.258 sec/step) INFO:tensorflow:global step 443: loss = 0.0319 (0.262 sec/step) I0422 07:10:14.974974 140486560917376 learning.py:507] global step 443: loss = 0.0319 (0.262 sec/step) INFO:tensorflow:global step 444: loss = 1549614536720384.0000 (0.273 sec/step) I0422 07:10:15.250102 140486560917376 learning.py:507] global step 444: loss = 1549614536720384.0000 (0.273 sec/step) INFO:tensorflow:global step 445: loss = 706502994165760.0000 (0.280 sec/step) I0422 07:10:15.532128 140486560917376 learning.py:507] global step 445: loss = 706502994165760.0000 (0.280 sec/step) INFO:tensorflow:global step 446: loss = 1583030992896000.0000 (0.292 sec/step) I0422 07:10:15.825623 140486560917376 learning.py:507] global step 446: loss = 1583030992896000.0000 (0.292 sec/step) INFO:tensorflow:global step 447: loss = 11534830458109952.0000 (0.286 sec/step) I0422 07:10:16.113319 140486560917376 learning.py:507] global step 447: loss = 11534830458109952.0000 (0.286 sec/step) INFO:tensorflow:global step 448: loss = 28171772826222592.0000 (0.310 sec/step) I0422 07:10:16.424631 140486560917376 learning.py:507] global step 448: loss = 28171772826222592.0000 (0.310 sec/step) INFO:tensorflow:global step 449: loss = 33334265533956096.0000 (0.271 sec/step) I0422 07:10:16.697575 140486560917376 learning.py:507] global step 449: loss = 33334265533956096.0000 (0.271 sec/step) INFO:tensorflow:global step 450: loss = 0.0328 (0.276 sec/step) I0422 07:10:16.975500 140486560917376 learning.py:507] global step 450: loss = 0.0328 (0.276 sec/step) INFO:tensorflow:global step 451: loss = 0.0162 (0.274 sec/step) I0422 07:10:17.251272 140486560917376 learning.py:507] global step 451: loss = 0.0162 (0.274 sec/step) INFO:tensorflow:global step 452: loss = 67449892993236992.0000 (0.313 sec/step) I0422 07:10:17.565719 140486560917376 learning.py:507] global step 452: loss = 67449892993236992.0000 (0.313 sec/step) INFO:tensorflow:global step 453: loss = 95736882612142080.0000 (0.263 sec/step) I0422 07:10:17.831138 140486560917376 learning.py:507] global step 453: loss = 95736882612142080.0000 (0.263 sec/step) INFO:tensorflow:global step 454: loss = 266017148894183424.0000 (0.280 sec/step) I0422 07:10:18.112651 140486560917376 learning.py:507] global step 454: loss = 266017148894183424.0000 (0.280 sec/step) INFO:tensorflow:global step 455: loss = 903144486052298752.0000 (0.298 sec/step) I0422 07:10:18.412338 140486560917376 learning.py:507] global step 455: loss = 903144486052298752.0000 (0.298 sec/step) INFO:tensorflow:global step 456: loss = 2083570548905869312.0000 (0.286 sec/step) I0422 07:10:18.700075 140486560917376 learning.py:507] global step 456: loss = 2083570548905869312.0000 (0.286 sec/step) INFO:tensorflow:global step 457: loss = 845515095910907904.0000 (0.269 sec/step) I0422 07:10:18.971438 140486560917376 learning.py:507] global step 457: loss = 845515095910907904.0000 (0.269 sec/step) INFO:tensorflow:global step 458: loss = 1061061568713719808.0000 (0.272 sec/step) I0422 07:10:19.245239 140486560917376 learning.py:507] global step 458: loss = 1061061568713719808.0000 (0.272 sec/step)

I am using google collab having following libraries absl-py==0.9.0 alabaster==0.7.12 albumentations==0.1.12 altair==4.1.0 asgiref==3.2.7 astor==0.8.1 astropy==4.0.1.post1 astunparse==1.6.3 atari-py==0.2.6 atomicwrites==1.3.0 attrs==19.3.0 audioread==2.1.8 autograd==1.3 Babel==2.8.0 backcall==0.1.0 backports.tempfile==1.0 backports.weakref==1.0.post1 beautifulsoup4==4.6.3 bleach==3.1.4 blis==0.4.1 bokeh==1.4.0 boto==2.49.0 boto3==1.12.40 botocore==1.15.40 Bottleneck==1.3.2 branca==0.4.0 bs4==0.0.1 bz2file==0.98 CacheControl==0.12.6 cachetools==3.1.1 catalogue==1.0.0 certifi==2020.4.5.1 cffi==1.14.0 chainer==6.5.0 chardet==3.0.4 click==7.1.1 cloudpickle==1.3.0 cmake==3.12.0 cmdstanpy==0.4.0 colorlover==0.3.0 community==1.0.0b1 contextlib2==0.5.5 convertdate==2.2.0 coverage==3.7.1 coveralls==0.5 crcmod==1.7 cufflinks==0.17.3 cupy-cuda101==6.5.0 cvxopt==1.2.5 cvxpy==1.0.31 cycler==0.10.0 cymem==2.0.3 Cython==0.29.16 daft==0.0.4 dask==2.12.0 dataclasses==0.7 datascience==0.10.6 decorator==4.4.2 defusedxml==0.6.0 descartes==1.1.0 dill==0.3.1.1 distributed==1.25.3 Django==3.0.5 dlib==19.18.0 dm-sonnet==1.35 docopt==0.6.2 docutils==0.15.2 dopamine-rl==1.0.5 earthengine-api==0.1.218 easydict==1.9 ecos==2.0.7.post1 editdistance==0.5.3 en-core-web-sm==2.2.5 entrypoints==0.3 ephem==3.7.7.1 et-xmlfile==1.0.1 fa2==0.3.5 fancyimpute==0.4.3 fastai==1.0.60 fastdtw==0.3.4 fastprogress==0.2.3 fastrlock==0.4 fbprophet==0.6 feather-format==0.4.0 featuretools==0.4.1 filelock==3.0.12 firebase-admin==4.0.1 fix-yahoo-finance==0.0.22 Flask==1.1.2 folium==0.8.3 fsspec==0.7.2 future==0.16.0 gast==0.3.3 GDAL==2.2.2 gdown==3.6.4 gensim==3.6.0 geographiclib==1.50 geopy==1.17.0 gevent==1.4.0 gin-config==0.3.0 glob2==0.7 google==2.0.3 google-api-core==1.16.0 google-api-python-client==1.7.12 google-auth==1.7.2 google-auth-httplib2==0.0.3 google-auth-oauthlib==0.4.1 google-cloud-bigquery==1.21.0 google-cloud-core==1.0.3 google-cloud-datastore==1.8.0 google-cloud-firestore==1.6.2 google-cloud-language==1.2.0 google-cloud-storage==1.18.1 google-cloud-translate==1.5.0 google-colab==1.0.0 google-pasta==0.2.0 google-resumable-media==0.4.1 googleapis-common-protos==1.51.0 googledrivedownloader==0.4 graph-nets==1.0.5 graphviz==0.10.1 greenlet==0.4.15 grpcio==1.28.1 gspread==3.0.1 gspread-dataframe==3.0.5 gunicorn==20.0.4 gym==0.17.1 h5py==2.10.0 HeapDict==1.0.1 holidays==0.9.12 html5lib==1.0.1 httpimport==0.5.18 httplib2==0.17.2 httplib2shim==0.0.3 humanize==0.5.1 hyperopt==0.1.2 ideep4py==2.0.0.post3 idna==2.8 image==1.5.30 imageio==2.4.1 imagesize==1.2.0 imbalanced-learn==0.4.3 imblearn==0.0 imgaug==0.2.9 importlib-metadata==1.6.0 imutils==0.5.3 inflect==2.1.0 intel-openmp==2020.0.133 intervaltree==2.1.0 ipykernel==4.10.1 ipython==5.5.0 ipython-genutils==0.2.0 ipython-sql==0.3.9 ipywidgets==7.5.1 itsdangerous==1.1.0 jax==0.1.62 jaxlib==0.1.42 jdcal==1.4.1 jedi==0.17.0 jieba==0.42.1 Jinja2==2.11.2 jmespath==0.9.5 joblib==0.14.1 jpeg4py==0.1.4 jsonschema==2.6.0 jupyter==1.0.0 jupyter-client==5.3.4 jupyter-console==5.2.0 jupyter-core==4.6.3 kaggle==1.5.6 kapre==0.1.3.1 Keras==2.3.1 Keras-Applications==1.0.8 Keras-Preprocessing==1.1.0 keras-vis==0.4.1 kfac==0.2.0 kiwisolver==1.2.0 knnimpute==0.1.0 librosa==0.6.3 lightgbm==2.2.3 llvmlite==0.31.0 lmdb==0.98 lucid==0.3.8 LunarCalendar==0.0.9 lxml==4.2.6 magenta==0.3.19 Markdown==3.2.1 MarkupSafe==1.1.1 matplotlib==3.2.1 matplotlib-venn==0.11.5 mesh-tensorflow==0.1.12 mido==1.2.6 mir-eval==0.5 missingno==0.4.2 mistune==0.8.4 mizani==0.6.0 mkl==2019.0 mlxtend==0.14.0 more-itertools==8.2.0 moviepy==0.2.3.5 mpi4py==3.0.3 mpmath==1.1.0 msgpack==1.0.0 multiprocess==0.70.9 multitasking==0.0.9 murmurhash==1.0.2 music21==5.5.0 natsort==5.5.0 nbconvert==5.6.1 nbformat==5.0.5 networkx==2.4 nibabel==3.0.2 nltk==3.2.5 notebook==5.2.2 np-utils==0.5.12.1 numba==0.48.0 numexpr==2.7.1 numpy==1.18.2 nvidia-ml-py3==7.352.0 oauth2client==4.1.3 oauthlib==3.1.0 object-detection==0.1 okgrade==0.4.3 opencv-contrib-python==4.1.2.30 opencv-python==4.1.2.30 openpyxl==2.5.9 opt-einsum==3.2.1 osqp==0.6.1 packaging==20.3 palettable==3.3.0 pandas==1.0.3 pandas-datareader==0.8.1 pandas-gbq==0.11.0 pandas-profiling==1.4.1 pandocfilters==1.4.2 parso==0.7.0 pathlib==1.0.1 patsy==0.5.1 pexpect==4.8.0 pickleshare==0.7.5 Pillow==7.0.0 pip-tools==4.5.1 plac==1.1.3 plotly==4.4.1 plotnine==0.6.0 pluggy==0.7.1 portpicker==1.3.1 prefetch-generator==1.0.1 preshed==3.0.2 pretty-midi==0.2.8 prettytable==0.7.2 progressbar2==3.38.0 prometheus-client==0.7.1 promise==2.3 prompt-toolkit==1.0.18 protobuf==3.10.0 psutil==5.4.8 psycopg2==2.7.6.1 ptvsd==5.0.0a12 ptyprocess==0.6.0 py==1.8.1 pyarrow==0.14.1 pyasn1==0.4.8 pyasn1-modules==0.2.8 pycocotools==2.0.0 pycparser==2.20 pydata-google-auth==0.3.0 pydot==1.3.0 pydot-ng==2.0.0 pydotplus==2.0.2 PyDrive==1.3.1 pyemd==0.5.1 pyglet==1.5.0 Pygments==2.1.3 pygobject==3.26.1 pymc3==3.7 PyMeeus==0.3.7 pymongo==3.10.1 pymystem3==0.2.0 PyOpenGL==3.1.5 pyparsing==2.4.7 pypng==0.0.20 pyrsistent==0.16.0 pysndfile==1.3.8 PySocks==1.7.1 pystan==2.19.1.1 pytest==3.6.4 python-apt==1.6.5+ubuntu0.2 python-chess==0.23.11 python-dateutil==2.8.1 python-louvain==0.14 python-rtmidi==1.4.0 python-slugify==4.0.0 python-utils==2.4.0 pytz==2018.9 PyWavelets==1.1.1 PyYAML==3.13 pyzmq==19.0.0 qtconsole==4.7.3 QtPy==1.9.0 regex==2019.12.20 requests==2.21.0 requests-oauthlib==1.3.0 resampy==0.2.2 retrying==1.3.3 rpy2==3.2.7 rsa==4.0 s3fs==0.4.2 s3transfer==0.3.3 scikit-image==0.16.2 scikit-learn==0.22.2.post1 scipy==1.4.1 screen-resolution-extra==0.0.0 scs==2.1.2 seaborn==0.10.0 semantic-version==2.8.4 Send2Trash==1.5.0 setuptools-git==1.2 Shapely==1.7.0 simplegeneric==0.8.1 six==1.12.0 sklearn==0.0 sklearn-pandas==1.8.0 smart-open==1.11.1 snowballstemmer==2.0.0 sortedcontainers==2.1.0 spacy==2.2.4 Sphinx==1.8.5 sphinxcontrib-websupport==1.2.1 SQLAlchemy==1.3.16 sqlparse==0.3.1 srsly==1.0.2 stable-baselines==2.2.1 statsmodels==0.10.2 sympy==1.1.1 tables==3.4.4 tabulate==0.8.7 tbb==2020.0.133 tblib==1.6.0 tensor2tensor==1.14.1 tensorboard==1.15.0 tensorboard-plugin-wit==1.6.0.post3 tensorboardcolab==0.0.22 tensorflow==1.15.2 tensorflow-addons==0.8.3 tensorflow-datasets==2.1.0 tensorflow-estimator==1.15.1 tensorflow-gan==2.0.0 tensorflow-gcs-config==2.1.8 tensorflow-hub==0.8.0 tensorflow-metadata==0.21.2 tensorflow-privacy==0.2.2 tensorflow-probability==0.7.0 termcolor==1.1.0 terminado==0.8.3 testpath==0.4.4 text-unidecode==1.3 textblob==0.15.3 textgenrnn==1.4.1 tflearn==0.3.2 Theano==1.0.4 thinc==7.4.0 toolz==0.10.0 torch==1.4.0 torchsummary==1.5.1 torchtext==0.3.1 torchvision==0.5.0 tornado==4.5.3 tqdm==4.38.0 traitlets==4.3.3 tweepy==3.6.0 typeguard==2.7.1 typing==3.6.6 typing-extensions==3.6.6 tzlocal==1.5.1 umap-learn==0.4.1 uritemplate==3.0.1 urllib3==1.24.3 vega-datasets==0.8.0 wasabi==0.6.0 wcwidth==0.1.9 webencodings==0.5.1 Werkzeug==1.0.1 widgetsnbextension==3.5.1 wordcloud==1.5.0 wrapt==1.12.1 xarray==0.15.1 xgboost==0.90 xkit==0.0.0 xlrd==1.1.0 xlwt==1.3.0 yellowbrick==0.9.1 zict==2.0.0 zipp==3.1.0 zmq==0.0.0

rggs commented 4 years ago

I am having a similar issue: one class, and around step 350 the loss explodes, despite my label map etc. looking fine.

rggs commented 4 years ago

Ok I feel like an idiot but I'm putting this here: there WAS an issue with my label map. In my label map, the class name was capitalized, whereas in the .record files it was all lower case. Changing the name in the label map file to all lower case (so that it was EXACTLY as it appeared in the .record and .csv files) seems to have fixed the issue.

Boltuzamaki commented 4 years ago

@rsbball11 Thanks for answering. I don't know about whether it was capitalized or not. But I made a new environment in my PC and done everything from scratch and now it is training fine.

One thing I noticed that even if I have only one class. In .cfg file, I have written 2 (I was just experimenting nonsense). The training was going fine and the loss was decreasing. I waited for around 1000 steps but still, it was decreasing. Idk why?

My problem is solved hence I am closing my issue :)