tryolabs / luminoth

Deep Learning toolkit for Computer Vision.
https://tryolabs.com
BSD 3-Clause "New" or "Revised" License
2.4k stars 400 forks source link

InvalidArgumentError (see above for traceback): Incompatible shapes: [0,4] vs. [16,4] #218

Closed lwdhw1987 closed 5 years ago

lwdhw1987 commented 5 years ago

When I run luminoth training with fasterrcnn + vgg16 and coco datdasets,I got this error infomation

InvalidArgumentError (see above for traceback): Incompatible shapes: [0,4] vs. [16,4]
         [[{{node losses/RCNNLoss/sub_1}} = Sub[_device="/job:localhost/replica:0/task:0/device:GPU:0"](losses/RCNNLoss/bbox_offset_cleaned/GatherV2, losses/RCNNLoss/bbox_offsets_target_labeled/GatherV2)]]
         [[{{node fasterrcnn/rcnn/rcnn_proposal_1/BoundingBoxTransform/change_order_19/stack/_1589}} = _Recv[_device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

below is my config file , is there something wrong with it ?

train:
  # Run on debug mode (which enables more logging).
  debug: True
  # Seed for random operations.
  seed:
  # Training batch size for images. FasterRCNN currently only supports 1.
  batch_size: 1
  # Base directory in which model checkpoints & summaries (for Tensorboard) will
  # be saved.
  job_dir: jobs/
  # Ignore scope when loading from checkpoint (useful when training RPN first
  # and then RPN + RCNN).
  ignore_scope:
  # Enables TensorFlow debug mode, which stops and lets you analyze Tensors
  # after each `Session.run`.
  tf_debug: False
  # Name used to identify the run. Data inside `job_dir` will be stored under
  # `run_name`.
  run_name:
  # Disables logging and saving checkpoints.
  no_log: False
  # Displays debugging images with results every N steps. Debug mode must be
  # enabled.
  display_every_steps:
  # Display debugging images every N seconds.
  display_every_secs: 300
  # Shuffle the dataset. It should only be disabled when trying to reproduce
  # some problem on some sample.
  random_shuffle: True
  # Save Tensorboard timeline.
  save_timeline: False
  # The frequency, in seconds, that a checkpoint is saved.
  save_checkpoint_secs: 600
  # The frequency, in number of global steps, that the summaries are written to
  # disk.
  save_summaries_steps:
  # The frequency, in seconds, that the summaries are written to disk.  If both
  # save_summaries_steps and save_summaries_secs are set to empty, then the
  # default summary saver isn't used.
  save_summaries_secs: 30
  # Run TensorFlow using full_trace mode for memory and running time logging
  # Debug mode must be enabled.
  full_trace: False
  # Clip gradients by norm, making sure the maximum value is 10.
  clip_by_norm: False
  # Learning rate config.
  learning_rate:
    # Because we're using kwargs, we want the learning_rate dict to be replaced
    # as a whole.
    _replace: True
    # Learning rate decay method; can be: ((empty), 'none', piecewise_constant,
    # exponential_decay, polynomial_decay) You can define different decay
    # methods using `decay_method` and defining all the necessary arguments.
    decay_method:
    learning_rate: 0.0003

  # Optimizer configuration.
  optimizer:
    # Because we're using kwargs, we want the optimizer dict to be replaced as a
    # whole.
    _replace: True
    # Type of optimizer to use (momentum, adam, gradient_descent, rmsprop).
    type: momentum
    # Any options are passed directly to the optimizer as kwarg.
    momentum: 0.9

  # Number of epochs (complete dataset batches) to run.
  num_epochs: 1000

  # Image visualization mode, options = train, eval, debug, (empty).
  # Default=(empty).
  image_vis: train
  # Variable summary visualization mode, options = full, reduced, (empty).
  var_vis:

eval:
  # Image visualization mode, options = train, eval, debug,
  # (empty). Default=(empty).
  image_vis: eval

dataset:
  type: object_detection
  # From which directory to read the dataset.
  dir: '/data/Datasets/coco/tf'
  # Which split of tfrecords to look for.
  split: train
  # Resize image according to min_size and max_size.
  image_preprocessing:
    min_size: 600
    max_size: 1024
  # Data augmentation techniques.
  data_augmentation:
    - flip:
        left_right: True
        up_down: False
        prob: 0.5
    # Also available:
    # # If you resize to too small images, you may end up not having any anchors
    # # that aren't partially outside the image.
    # - resize:
    #     min_size: 600
    #     max_size: 1024
    #     prob: 0.2
    # - patch:
    #     min_height: 600
    #     min_width: 600
    #     prob: 0.2
    # - distortion:
    #     brightness:
    #       max_delta: 0.2
    #     hue:
    #       max_delta: 0.2
    #     saturation:
    #       lower: 0.5
    #       upper: 1.5
    #     prob: 0.3

model:
  type: fasterrcnn
  network:
    # Total number of classes to predict.
    num_classes: 20
    # Use RCNN or just RPN.
    with_rcnn: True

  # Whether to use batch normalization in the model.
  batch_norm: False

  base_network:
    # Which type of pretrained network to use.
    architecture: vgg_16
    # Should we train the pretrained network.
    trainable: True
    # From which file to load the weights.
    weights:
    # Should we download weights if not available.
    download: False
    # Which endpoint layer to use as feature map for network.
    endpoint: conv5/conv5_1
    # Starting point after which all the variables in the base network will be
    # trainable. If not specified, then all the variables in the network will be
    # trainable.
    fine_tune_from: conv4/conv4_2
    # Whether to train the ResNet's batch norm layers.
    train_batch_norm: False
    # Whether to use the base network's tail in the RCNN.
    use_tail: True
    # Whether to freeze the base network's tail.
    freeze_tail: False
    # Output stride for ResNet.
    output_stride: 16
    arg_scope:
      # Regularization.
      weight_decay: 0.0005

  loss:
    # Loss weights for calculating the total loss.
    rpn_cls_loss_weight: 1.0
    rpn_reg_loss_weights: 2.0
    rcnn_cls_loss_weight: 1.0
    rcnn_reg_loss_weights: 2.0

  anchors:
    # Base size to use for anchors.
    base_size: 256
    # Scale used for generating anchor sizes.
    scales: [0.5, 1, 2]
    # Aspect ratios used for generating anchors.
    ratios: [0.5, 1, 2]
    # Stride depending on feature map size (of pretrained).
    stride: 16

  rpn:
    activation_function: relu6
    l2_regularization_scale: 0.0005  # Disable using 0.
    # Sigma for the smooth L1 regression loss.
    l1_sigma: 3.0
    # Number of filters for the RPN conv layer.
    num_channels: 512
    # Kernel shape for the RPN conv layer.
    kernel_shape: [3, 3]
    # Initializers for RPN weights.
    rpn_initializer:
      _replace: True
      type: variance_scaling_initializer
      #mean: 0.0
      #stddev: 0.01
      factor: 1.0
      uniform: True
      mode: FAN_AVG

    cls_initializer:
      _replace: True
      type: truncated_normal_initializer
      mean: 0.0
      stddev: 0.01
    bbox_initializer:
      _replace: True
      type: truncated_normal_initializer
      mean: 0.0
      stddev: 0.001

    proposals:
      # Total proposals to use before running NMS (sorted by score).
      pre_nms_top_n: 12000
      # Total proposals to use after NMS (sorted by score).
      post_nms_top_n: 2000
      # Option to apply NMS.
      apply_nms: True
      # NMS threshold used when removing "almost duplicates".
      nms_threshold: 0.7
      min_size: 0  # Disable using 0.
      # Run clipping of proposals after running NMS.
      clip_after_nms: False
      # Filter proposals from anchors partially outside the image.
      filter_outside_anchors: False
      # Minimum probability to be used as proposed object.
      min_prob_threshold: 0.0

    target:
      # Margin to crop proposals to close to the border.
      allowed_border: 0
      # Overwrite positives with negative if threshold is too low.
      clobber_positives: False
      # How much IoU with GT proposals must have to be marked as positive.
      foreground_threshold: 0.6
      # High and low thresholds with GT to be considered background.
      background_threshold_high: 0.3
      background_threshold_low: 0.0
      foreground_fraction: 0.5
      # Ration between background and foreground in minibatch.
      minibatch_size: 256
      # Assign to get consistent "random" selection in batch.
      random_seed:  # Only to be used for debugging.

  rcnn:
    layer_sizes: []  # Could be e.g. `[4096, 4096]`.
    dropout_keep_prob: 1.0
    activation_function: relu6
    l2_regularization_scale: 0.0005
    # Sigma for the smooth L1 regression loss.
    l1_sigma: 1.0
    # Use average pooling before the last fully-connected layer.
    use_mean: True
    # Variances to normalize encoded targets with.
    target_normalization_variances: [0.1, 0.2]

    rcnn_initializer:
      _replace: True
      type: variance_scaling_initializer
      factor: 1.0
      uniform: True
      mode: FAN_AVG
    cls_initializer:
      _replace: True
      type: random_normal_initializer
      mean: 0.0
      stddev: 0.01
    bbox_initializer:
      _replace: True
      type: random_normal_initializer
      mean: 0.0
      stddev: 0.001

    roi:
      pooling_mode: crop
      pooled_width: 7
      pooled_height: 7
      padding: VALID

    proposals:
      # Maximum number of detections for each class.
      class_max_detections: 100
      # NMS threshold used to remove "almost duplicate" of the same class.
      class_nms_threshold: 0.6
      # Maximum total detections for an image (sorted by score).
      total_max_detections: 300
      # Minimum prob to be used as proposed object.
      min_prob_threshold: 0.0

    target:
      # Ratio between foreground and background samples in minibatch.
      foreground_fraction: 0.25
      minibatch_size: 64
      # Threshold with GT to be considered positive.
      foreground_threshold: 0.5
      # High and low threshold with GT to be considered negative.
      background_threshold_high: 0.5
      background_threshold_low: 0.1
nagitsu commented 5 years ago

Using the VGG as the base network isn't working currently. You should use the ResNet101 instead, as you'll get much better results. In fact, we're planning on removing the rest of the base networks in a future release, except maybe for a smaller ResNet.

If you want a faster model, you might try using SSD instead.

lwdhw1987 commented 5 years ago

I Use the ResNet101, but I still got the same error,the config file I use is ../models/fasterrcnn/base_config.yml, with coco datasets

root@G560-ubuntu1:/home/p00306424/luminoth/luminoth/models/fasterrcnn# lumi train -c base_config.yml
INFO:tensorflow:Training 279 vars from pretrained module; from "truncated_base_network/resnet_v1_101/block2/unit_1/bottleneck_v1/shortcut/weights:0" to "truncated_base_network/resnet_v1_101/block4/unit_3/bottleneck_v1/conv3/BatchNorm/beta:0".
/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gradients_impl.py:108: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
INFO:tensorflow:Starting training for <luminoth.models.fasterrcnn.fasterrcnn.FasterRCNN object at 0x7f0793b67fd0>
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/util/tf_should_use.py:118: initialize_local_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.local_variables_initializer` instead.
INFO:tensorflow:ImageVisHook was created with mode = "train"
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
2018-09-12 23:13:12.937819: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Found device 0 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:2d:00.0
totalMemory: 15.89GiB freeMemory: 15.60GiB
2018-09-12 23:13:13.278919: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Found device 1 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:31:00.0
totalMemory: 15.89GiB freeMemory: 15.60GiB
2018-09-12 23:13:13.625421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Found device 2 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:35:00.0
totalMemory: 15.89GiB freeMemory: 15.60GiB
2018-09-12 23:13:13.981006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Found device 3 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:39:00.0
totalMemory: 15.89GiB freeMemory: 15.60GiB
2018-09-12 23:13:14.345910: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Found device 4 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:a9:00.0
totalMemory: 15.89GiB freeMemory: 15.60GiB
2018-09-12 23:13:14.720198: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Found device 5 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:ad:00.0
totalMemory: 15.89GiB freeMemory: 15.60GiB
2018-09-12 23:13:15.106763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Found device 6 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:b1:00.0
totalMemory: 15.89GiB freeMemory: 15.60GiB
2018-09-12 23:13:15.511936: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Found device 7 with properties:
name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:b5:00.0
totalMemory: 15.89GiB freeMemory: 15.60GiB
2018-09-12 23:13:15.528713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1485] Adding visible gpu devices: 0, 1, 2, 3, 4, 5, 6, 7
2018-09-12 23:13:18.185047: I tensorflow/core/common_runtime/gpu/gpu_device.cc:966] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-09-12 23:13:18.185117: I tensorflow/core/common_runtime/gpu/gpu_device.cc:972]      0 1 2 3 4 5 6 7
2018-09-12 23:13:18.185129: I tensorflow/core/common_runtime/gpu/gpu_device.cc:985] 0:   N Y Y Y N N N N
2018-09-12 23:13:18.185136: I tensorflow/core/common_runtime/gpu/gpu_device.cc:985] 1:   Y N Y Y N N N N
2018-09-12 23:13:18.185143: I tensorflow/core/common_runtime/gpu/gpu_device.cc:985] 2:   Y Y N Y N N N N
2018-09-12 23:13:18.185150: I tensorflow/core/common_runtime/gpu/gpu_device.cc:985] 3:   Y Y Y N N N N N
2018-09-12 23:13:18.185156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:985] 4:   N N N N N Y Y Y
2018-09-12 23:13:18.185163: I tensorflow/core/common_runtime/gpu/gpu_device.cc:985] 5:   N N N N Y N Y Y
2018-09-12 23:13:18.185170: I tensorflow/core/common_runtime/gpu/gpu_device.cc:985] 6:   N N N N Y Y N Y
2018-09-12 23:13:18.185176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:985] 7:   N N N N Y Y Y N
2018-09-12 23:13:18.188713: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1098] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 15119 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:2d:00.0, compute capability: 6.0)
2018-09-12 23:13:18.348337: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1098] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 15119 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:31:00.0, compute capability: 6.0)
2018-09-12 23:13:18.505829: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1098] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:2 with 15119 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:35:00.0, compute capability: 6.0)
2018-09-12 23:13:18.664811: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1098] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:3 with 15119 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:39:00.0, compute capability: 6.0)
2018-09-12 23:13:18.825456: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1098] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:4 with 15119 MB memory) -> physical GPU (device: 4, name: Tesla P100-PCIE-16GB, pci bus id: 0000:a9:00.0, compute capability: 6.0)
2018-09-12 23:13:18.986247: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1098] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:5 with 15119 MB memory) -> physical GPU (device: 5, name: Tesla P100-PCIE-16GB, pci bus id: 0000:ad:00.0, compute capability: 6.0)
2018-09-12 23:13:19.145267: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1098] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:6 with 15119 MB memory) -> physical GPU (device: 6, name: Tesla P100-PCIE-16GB, pci bus id: 0000:b1:00.0, compute capability: 6.0)
2018-09-12 23:13:19.303491: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1098] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:7 with 15119 MB memory) -> physical GPU (device: 7, name: Tesla P100-PCIE-16GB, pci bus id: 0000:b5:00.0, compute capability: 6.0)
INFO:tensorflow:Restoring parameters from jobs/model.ckpt-0
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into jobs/model.ckpt.
Traceback (most recent call last):
  File "/usr/local/bin/lumi", line 11, in <module>
    load_entry_point('luminoth==0.2.1.dev0', 'console_scripts', 'lumi')()
  File "/usr/local/lib/python2.7/dist-packages/click-6.7-py2.7.egg/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/click-6.7-py2.7.egg/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python2.7/dist-packages/click-6.7-py2.7.egg/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python2.7/dist-packages/click-6.7-py2.7.egg/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python2.7/dist-packages/click-6.7-py2.7.egg/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/luminoth-0.2.1.dev0-py2.7.egg/luminoth/train.py", line 307, in train
    config, environment=environment
  File "/usr/local/lib/python2.7/dist-packages/luminoth-0.2.1.dev0-py2.7.egg/luminoth/train.py", line 239, in run
    ], options=run_options)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 583, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1059, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1150, in run
    raise six.reraise(*original_exc_info)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1135, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 1207, in run
    run_metadata=run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 987, in run
    return self._sess.run(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 887, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1110, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1286, in _do_run
    run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1308, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Incompatible shapes: [0,4] vs. [3,4]
         [[{{node losses/RCNNLoss/sub_1}} = Sub[_device="/job:localhost/replica:0/task:0/device:GPU:0"](losses/RCNNLoss/bbox_offset_cleaned/GatherV2, losses/RCNNLoss/bbox_offsets_target_labeled/GatherV2)]]
         [[{{node losses/RCNNLoss/strided_slice_1/_4893}} = _Recv[_device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op u'losses/RCNNLoss/sub_1', defined at:
  File "/usr/local/bin/lumi", line 11, in <module>
    load_entry_point('luminoth==0.2.1.dev0', 'console_scripts', 'lumi')()
  File "/usr/local/lib/python2.7/dist-packages/click-6.7-py2.7.egg/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/click-6.7-py2.7.egg/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python2.7/dist-packages/click-6.7-py2.7.egg/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python2.7/dist-packages/click-6.7-py2.7.egg/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python2.7/dist-packages/click-6.7-py2.7.egg/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/luminoth-0.2.1.dev0-py2.7.egg/luminoth/train.py", line 307, in train
    config, environment=environment
  File "/usr/local/lib/python2.7/dist-packages/luminoth-0.2.1.dev0-py2.7.egg/luminoth/train.py", line 67, in run
    total_loss = model.loss(prediction_dict)
  File "/usr/local/lib/python2.7/dist-packages/luminoth-0.2.1.dev0-py2.7.egg/luminoth/models/fasterrcnn/fasterrcnn.py", line 192, in loss
    prediction_dict['classification_prediction']
  File "/usr/local/lib/python2.7/dist-packages/luminoth-0.2.1.dev0-py2.7.egg/luminoth/models/fasterrcnn/rcnn.py", line 391, in loss
    sigma=self._l1_sigma
  File "/usr/local/lib/python2.7/dist-packages/luminoth-0.2.1.dev0-py2.7.egg/luminoth/utils/losses.py", line 22, in smooth_l1_loss
    diff = bbox_prediction - bbox_target
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/math_ops.py", line 851, in binary_op_wrapper
    return func(x, y, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 8246, in sub
    "Sub", x=x, y=y, name=name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3260, in create_op
    op_def=op_def)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Incompatible shapes: [0,4] vs. [3,4]
         [[{{node losses/RCNNLoss/sub_1}} = Sub[_device="/job:localhost/replica:0/task:0/device:GPU:0"](losses/RCNNLoss/bbox_offset_cleaned/GatherV2, losses/RCNNLoss/bbox_offsets_target_labeled/GatherV2)]]
         [[{{node losses/RCNNLoss/strided_slice_1/_4893}} = _Recv[_device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

root@G560-ubuntu1:/home/p00306424/luminoth/luminoth/models/fasterrcnn#
nagitsu commented 5 years ago

On the config, make sure you set the correct number of classes for the dataset. For the COCO dataset, that's 80:

model:
  type: fasterrcnn
  network:
    num_classes: 80

I recommend that you use the config file at examples/sample_config.yml, as it hides most of the very specific settings.