tryolabs / luminoth

Deep Learning toolkit for Computer Vision.
https://tryolabs.com
BSD 3-Clause "New" or "Revised" License
2.4k stars 399 forks source link

I get very very low AP values ​​in my data set #214

Closed akorez closed 6 years ago

akorez commented 6 years ago

Hi, I have converted the Stanford Drone Dataset to Pascal VOC format. I then converted it to TFRecords with the command lumi Adapting Dataset. Dataset content : train ==> 40000 images trainval ==> 56000 images val==> 17000 images test ==> 23500 images I have 6 class(pedestrian, biker, skater, car, cart, bus). my config file :

train:
  # Run on debug mode (which enables more logging).
  debug: False
  # Seed for random operations.
  seed:
  # Training batch size for images. FasterRCNN currently only supports 1.
  batch_size: 1
  # Base directory in which model checkpoints & summaries (for Tensorboard) will
  # be saved.
  job_dir: jobs/
  # Ignore scope when loading from checkpoint (useful when training RPN first
  # and then RPN + RCNN).
  ignore_scope:
  # Enables TensorFlow debug mode, which stops and lets you analyze Tensors
  # after each `Session.run`.
  tf_debug: False
  # Name used to identify the run. Data inside `job_dir` will be stored under
  # `run_name`.
  run_name: Stanford/
  # Disables logging and saving checkpoints.
  no_log: False
  # Displays debugging images with results every N steps. Debug mode must be
  # enabled.
  display_every_steps:
  # Display debugging images every N seconds.
  display_every_secs: 300
  # Shuffle the dataset. It should only be disabled when trying to reproduce
  # some problem on some sample.
  random_shuffle: True
  # Save Tensorboard timeline.
  save_timeline: False
  # The frequency, in seconds, that a checkpoint is saved.
  save_checkpoint_secs: 600
  # The frequency, in number of global steps, that the summaries are written to
  # disk.
  save_summaries_steps:
  # The frequency, in seconds, that the summaries are written to disk.  If both
  # save_summaries_steps and save_summaries_secs are set to empty, then the
  # default summary saver isn't used.
  save_summaries_secs: 30
  # Run TensorFlow using full_trace mode for memory and running time logging
  # Debug mode must be enabled.
  full_trace: False
  # Clip gradients by norm, making sure the maximum value is 10.
  clip_by_norm: False
  # Learning rate config.
  learning_rate:
    # Because we're using kwargs, we want the learning_rate dict to be replaced
    # as a whole.
    _replace: True
    # Learning rate decay method; can be: ((empty), 'none', piecewise_constant,
    # exponential_decay, polynomial_decay) You can define different decay
    # methods using `decay_method` and defining all the necessary arguments.
    decay_method:
    learning_rate: 0.0003

  # Optimizer configuration.
  optimizer:
    # Because we're using kwargs, we want the optimizer dict to be replaced as a
    # whole.
    _replace: True
    # Type of optimizer to use (momentum, adam, gradient_descent, rmsprop).
    type: momentum
    # Any options are passed directly to the optimizer as kwarg.
    momentum: 0.9

  # Number of epochs (complete dataset batches) to run.
  num_epochs: 1000

  # Image visualization mode, options = train, eval, debug, (empty).
  # Default=(empty).
  image_vis: train
  # Variable summary visualization mode, options = full, reduced, (empty).
  var_vis:

eval:
  # Image visualization mode, options = train, eval, debug,
  # (empty). Default=(empty).
  image_vis: eval

dataset:
  type: object_detection
  # From which directory to read the dataset.
  dir: 'tf/'
  # Which split of tfrecords to look for.
  split: train
  # Resize image according to min_size and max_size.
  image_preprocessing:
    min_size: 600
    max_size: 1024
  # Data augmentation techniques.
  data_augmentation:
    - flip:
        left_right: True
        up_down: False
        prob: 0.5
    # Also available:
    # # If you resize to too small images, you may end up not having any anchors
    # # that aren't partially outside the image.
    # - resize:
    #     min_size: 600
    #     max_size: 1024
    #     prob: 0.2
    # - patch:
    #     min_height: 600
    #     min_width: 600
    #     prob: 0.2
    # - distortion:
    #     brightness:
    #       max_delta: 0.2
    #     hue:
    #       max_delta: 0.2
    #     saturation:
    #       lower: 0.5
    #       upper: 1.5
    #     prob: 0.3

model:
  type: fasterrcnn
  network:
    # Total number of classes to predict.
    num_classes: 6
    # Use RCNN or just RPN.
    with_rcnn: True

  # Whether to use batch normalization in the model.
  batch_norm: False

  base_network:
    # Which type of pretrained network to use.
    architecture: resnet_v1_50
    # Should we train the pretrained network.
    trainable: True
    # From which file to load the weights.
    weights:
    # Should we download weights if not available.
    download: True
    # Which endpoint layer to use as feature map for network.
    endpoint:
    # Starting point after which all the variables in the base network will be
    # trainable. If not specified, then all the variables in the network will be
    # trainable.
    fine_tune_from: block2
    # Whether to train the ResNet's batch norm layers.
    train_batch_norm: False
    # Whether to use the base network's tail in the RCNN.
    use_tail: True
    # Whether to freeze the base network's tail.
    freeze_tail: False
    # Output stride for ResNet.
    output_stride: 16
    arg_scope:
      # Regularization.
      weight_decay: 0.0005

  loss:
    # Loss weights for calculating the total loss.
    rpn_cls_loss_weight: 1.0
    rpn_reg_loss_weights: 1.0
    rcnn_cls_loss_weight: 1.0
    rcnn_reg_loss_weights: 1.0

  anchors:
    # Base size to use for anchors.
    base_size: 256
    # Scale used for generating anchor sizes.
    scales: [0.25, 0.5, 1, 2]
    # Aspect ratios used for generating anchors.
    ratios: [0.5, 1, 2]
    # Stride depending on feature map size (of pretrained).
    stride: 16

  rpn:
    activation_function: relu6
    l2_regularization_scale: 0.0005  # Disable using 0.
    # Sigma for the smooth L1 regression loss.
    l1_sigma: 3.0
    # Number of filters for the RPN conv layer.
    num_channels: 512
    # Kernel shape for the RPN conv layer.
    kernel_shape: [3, 3]
    # Initializers for RPN weights.
    rpn_initializer:
      _replace: True
      type: random_normal_initializer
      mean: 0.0
      stddev: 0.01
    cls_initializer:
      _replace: True
      type: random_normal_initializer
      mean: 0.0
      stddev: 0.01
    bbox_initializer:
      _replace: True
      type: random_normal_initializer
      mean: 0.0
      stddev: 0.001

    proposals:
      # Total proposals to use before running NMS (sorted by score).
      pre_nms_top_n: 12000
      # Total proposals to use after NMS (sorted by score).
      post_nms_top_n: 2000
      # Option to apply NMS.
      apply_nms: True
      # NMS threshold used when removing "almost duplicates".
      nms_threshold: 0.7
      min_size: 0  # Disable using 0.
      # Run clipping of proposals after running NMS.
      clip_after_nms: False
      # Filter proposals from anchors partially outside the image.
      filter_outside_anchors: False
      # Minimum probability to be used as proposed object.
      min_prob_threshold: 0.0

    target:
      # Margin to crop proposals to close to the border.
      allowed_border: 0
      # Overwrite positives with negative if threshold is too low.
      clobber_positives: False
      # How much IoU with GT proposals must have to be marked as positive.
      foreground_threshold: 0.7
      # High and low thresholds with GT to be considered background.
      background_threshold_high: 0.3
      background_threshold_low: 0.0
      foreground_fraction: 0.5
      # Ration between background and foreground in minibatch.
      minibatch_size: 256
      # Assign to get consistent "random" selection in batch.
      random_seed:  # Only to be used for debugging.

  rcnn:
    layer_sizes: []  # Could be e.g. `[4096, 4096]`.
    dropout_keep_prob: 1.0
    activation_function: relu6
    l2_regularization_scale: 0.0005
    # Sigma for the smooth L1 regression loss.
    l1_sigma: 1.0
    # Use average pooling before the last fully-connected layer.
    use_mean: True
    # Variances to normalize encoded targets with.
    target_normalization_variances: [0.1, 0.2]

    rcnn_initializer:
      _replace: True
      type: variance_scaling_initializer
      factor: 1.0
      uniform: True
      mode: FAN_AVG
    cls_initializer:
      _replace: True
      type: random_normal_initializer
      mean: 0.0
      stddev: 0.01
    bbox_initializer:
      _replace: True
      type: random_normal_initializer
      mean: 0.0
      stddev: 0.001

    roi:
      pooling_mode: crop
      pooled_width: 7
      pooled_height: 7
      padding: VALID

    proposals:
      # Maximum number of detections for each class.
      class_max_detections: 100
      # NMS threshold used to remove "almost duplicate" of the same class.
      class_nms_threshold: 0.5
      # Maximum total detections for an image (sorted by score).
      total_max_detections: 300
      # Minimum prob to be used as proposed object.
      min_prob_threshold: 0.5

    target:
      # Ratio between foreground and background samples in minibatch.
      foreground_fraction: 0.25
      minibatch_size: 256
      # Threshold with GT to be considered positive.
      foreground_threshold: 0.5
      # High and low threshold with GT to be considered negative.
      background_threshold_high: 0.5
      background_threshold_low: 0.0

And result: sonuc Average Precission is terribly bad. where did i make mistake? How can fix this?Please help

dekked commented 6 years ago

How many steps did you run your training for?

If you follow this hands-on guide (which should be integrated into the official docs soon) you will have tips on how to do the training.

Also, your config seems overly complex. You should only specify those values that you really need like this sample.

I am closing this issue, but you can keep commenting :)