Invalid Argument error for train_shapes.ipynb #265

yasiemir opened 6 years ago

yasiemir commented 6 years ago

running train_shapes with the following config on a machine with CPU only:

class ShapesConfig(Config):

    NAME = "shapes"  
    GPU_COUNT = 1
    NUM_CLASSES = 1 + 3  # background + 3 shapes   
    IMAGE_MIN_DIM = 128
    IMAGE_MAX_DIM = 128    
    RPN_ANCHOR_SCALES = (8, 16, 32, 64, 128)  # anchor side in pixels
    BACKBONE_STRIDES = [4,8,16]
    RPN_ANCHOR_SCALES = (32, 64)

and and training with layers='heads':

model.train(dataset_train, dataset_val, 

generates the following InvalidArgumentError:

InvalidArgumentError: indices[1] = 3949 is not in [0, 3840)
     [[Node: ROI/Gather_22 = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ROI/Gather_16/params, ROI/strided_slice_39)]]

InvalidArgumentError: indices[1] = 3949 is not in [0, 3840)
     [[Node: ROI/Gather_22 = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ROI/Gather_16/params, ROI/strided_slice_39)]]

InvalidArgumentError (see above for traceback): indices[1] = 3949 is not in [0, 3840)
     [[Node: ROI/Gather_22 = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, validate_indices=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](ROI/Gather_16/params, ROI/strided_slice_39)]]
woodm1979 commented 6 years ago

I'm also running into this issue. I'll try and debug it tonight/tomorrow.

yasiemir commented 6 years ago

I think its related to #211

woodm1979 commented 6 years ago

Somehow we're getting more Scores than we are anchors. ... which seems to be impossible, so there may be a mismatch in combining some things somehow.

tonyzhao6 commented 6 years ago

Are you using the default ResNet-101 with FPN structure? If so, you have a mismatch in the number of elements between RPN_ANCHOR_SCALES and BACKBONE_STRIDES. Also, you have RPN_ANCHOR_SCALES defined twice.

In general, if you are using ResNet-101 with FPN and taking the C4 backbone, there are 5 levels in the FPN. Thus, you should have 5 elements in RPN_ANCHOR_SCALES and 5 elements in BACKBONE_STRIDES.

