matterport / Mask_RCNN

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow
Other
24.63k stars 11.7k forks source link

Error in PredictCost for the op: op "CropAndResize" #2837

Open marul12q opened 2 years ago

marul12q commented 2 years ago

I wanted to train Mask RCNN on google Colab on GPU. And this Error/Warning comes in. The same code runs on CPU without any issues. Tensorflow 2.2.0. Keras 2.8.0. Does it have influence on training performance? Any suggestions?

Configurations: BACKBONE resnet50 BACKBONE_STRIDES [4, 8, 16, 32, 64] BATCH_SIZE 1 BBOX_STD_DEV [0.1 0.1 0.2 0.2] COMPUTE_BACKBONE_SHAPE None DETECTION_MAX_INSTANCES 100 DETECTION_MIN_CONFIDENCE 0.9 DETECTION_NMS_THRESHOLD 0.3 FPN_CLASSIF_FC_LAYERS_SIZE 1024 GPU_COUNT 1 GRADIENT_CLIP_NORM 5.0 IMAGES_PER_GPU 1 IMAGE_CHANNEL_COUNT 3 IMAGE_MAX_DIM 1024 IMAGE_META_SIZE 14 IMAGE_MIN_DIM 800 IMAGE_MIN_SCALE 0 IMAGE_RESIZE_MODE square IMAGE_SHAPE [1024 1024 3] LEARNING_MOMENTUM 0.9 LEARNING_RATE 0.001 LOSS_WEIGHTS {'rpn_class_loss': 1.0, 'rpn_bbox_loss': 1.0, 'mrcnn_class_loss': 1.0, 'mrcnn_bbox_loss': 1.0, 'mrcnn_mask_loss': 1.0} MASK_POOL_SIZE 14 MASK_SHAPE [28, 28] MAX_GT_INSTANCES 100 MEAN_PIXEL [123.7 116.8 103.9] MINI_MASK_SHAPE (84, 84) NAME gland NUM_CLASSES 2 POOL_SIZE 7 POST_NMS_ROIS_INFERENCE 1000 POST_NMS_ROIS_TRAINING 2000 PRE_NMS_LIMIT 6000 ROI_POSITIVE_RATIO 0.33 RPN_ANCHOR_RATIOS [0.5, 1, 2] RPN_ANCHOR_SCALES (32, 64, 128, 256, 512) RPN_ANCHOR_STRIDE 1 RPN_BBOX_STD_DEV [0.1 0.1 0.2 0.2] RPN_NMS_THRESHOLD 0.7 RPN_TRAIN_ANCHORS_PER_IMAGE 256 STEPS_PER_EPOCH 100 TOP_DOWN_PYRAMID_SIZE 256 TRAIN_BN False TRAIN_ROIS_PER_IMAGE 200 USE_MINI_MASK True USE_RPN_ROIS True VALIDATION_STEPS 50 WEIGHT_DECAY 0.0001

Downloading pretrained model to /content/projektBadawczy/mask_rcnn_coco.h5 ... ... done downloading pretrained model! Loading weights /content/projektBadawczy/mask_rcnn_coco.h5 2022-05-29 06:40:47.457746: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0. Training network heads

Starting at epoch 0. LR=0.001

Checkpoint Path: /content/projektBadawczy/logs/gland20220529T0640/mask_rcnngland{epoch:04d}.h5 Selecting layers to train fpn_c5p5 (Conv2D) fpn_c4p4 (Conv2D) fpn_c3p3 (Conv2D) fpn_c2p2 (Conv2D) fpn_p5 (Conv2D) fpn_p2 (Conv2D) fpn_p3 (Conv2D) fpn_p4 (Conv2D) rpn_model (Functional) mrcnn_mask_conv1 (TimeDistributed) mrcnn_mask_bn1 (TimeDistributed) mrcnn_mask_conv2 (TimeDistributed) mrcnn_mask_bn2 (TimeDistributed) mrcnn_class_conv1 (TimeDistributed) mrcnn_class_bn1 (TimeDistributed) mrcnn_mask_conv3 (TimeDistributed) mrcnn_mask_bn3 (TimeDistributed) mrcnn_class_conv2 (TimeDistributed) mrcnn_class_bn2 (TimeDistributed) mrcnn_mask_conv4 (TimeDistributed) mrcnn_mask_bn4 (TimeDistributed) mrcnn_bbox_fc (TimeDistributed) mrcnn_mask_deconv (TimeDistributed) mrcnn_class_logits (TimeDistributed) mrcnn_mask (TimeDistributed) /usr/local/lib/python3.7/dist-packages/keras/optimizer_v2/gradient_descent.py:102: UserWarning: The lr argument is deprecated, use learning_rate instead. super(SGD, self).init(name, **kwargs) Epoch 1/30 /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/indexed_slices.py:446: UserWarning: Converting sparse IndexedSlices(IndexedSlices(indices=Tensor("training/SGD/gradients/gradients/roi_align_classifier/concat_grad/sub:0", shape=(None,), dtype=int32), values=Tensor("training/SGD/gradients/gradients/roi_align_classifier/concat_grad/GatherV2_2:0", shape=(None, 7, 7, 256), dtype=float32), dense_shape=Tensor("training/SGD/gradients/gradients/roi_align_classifier/concat_grad/Shape:0", shape=(4,), dtype=int32))) to a dense Tensor of unknown shape. This may consume a large amount of memory. "shape. This may consume a large amount of memory." % value 2022-05-29 06:44:01.503850: W tensorflow/core/grappler/costs/op_level_cost_estimator.cc:690] Error in PredictCost() for the op: op: "CropAndResize" attr { key: "T" value { type: DT_FLOAT } } attr { key: "extrapolation_value" value { f: 0 } } attr { key: "method" value { s: "bilinear" } } inputs { dtype: DT_FLOAT shape { dim { size: -438 } dim { size: 84 } dim { size: 84 } dim { size: 1 } } } inputs { dtype: DT_FLOAT shape { dim { size: -25 } dim { size: 4 } } } inputs { dtype: DT_INT32 shape { dim { size: -25 } } } inputs { dtype: DT_INT32 shape { dim { size: 2 } } value { dtype: DT_INT32 tensor_shape { dim { size: 2 } } int_val: 28 } } device { type: "GPU" vendor: "NVIDIA" model: "Tesla T4" frequency: 1590 num_cores: 40 environment { key: "architecture" value: "7.5" } environment { key: "cuda" value: "11010" } environment { key: "cudnn" value: "8005" } num_registers: 65536 l1_cache_size: 24576 l2_cache_size: 4194304 shared_memory_size_per_multiprocessor: 65536 memory_size: 14465892352 bandwidth: 320064000 } outputs { dtype: DT_FLOAT shape { dim { size: -25 } dim { size: 28 } dim { size: 28 } dim { size: 1 } } } 100/100 [==============================] - 168s 1s/step - batch: 49.5000 - size: 1.0000 - loss: 5.7086 - rpn_class_loss: 0.4854 - rpn_bbox_loss: 2.6237 - mrcnn_class_loss: 0.4647 - mrcnn_bbox_loss: 1.4705 - mrcnn_mask_loss: 0.6643 - val_loss: 5.5708 - val_rpn_class_loss: 0.3285 - val_rpn_bbox_loss: 2.2625 - val_mrcnn_class_loss: 0.8341 - val_mrcnn_bbox_loss: 1.4832 - val_mrcnn_mask_loss: 0.6625

Davidsonson commented 2 years ago

Did you ever find a solution to this?

dluks commented 2 years ago

I also get this error, but I have no idea if it's something that is likely to be affecting the training or if it's just another one of those "oh it's just tensorflow being tensorflow, don't worry about it" errors.