I get very very low AP values in my data set

Hi, I have converted the Stanford Drone Dataset to Pascal VOC format. I then converted it to TFRecords with the command lumi Adapting Dataset. Dataset content : train ==> 40000 images trainval ==> 56000 images val==> 17000 images test ==> 23500 images I have 6 class(pedestrian, biker, skater, car, cart, bus). my config file :
train:
  # Run on debug mode (which enables more logging).
  debug: False
  # Seed for random operations.
  seed:
  # Training batch size for images. FasterRCNN currently only supports 1.
  batch_size: 1
  # Base directory in which model checkpoints & summaries (for Tensorboard) will
  # be saved.
  job_dir: jobs/
  # Ignore scope when loading from checkpoint (useful when training RPN first
  # and then RPN + RCNN).
  ignore_scope:
  # Enables TensorFlow debug mode, which stops and lets you analyze Tensors
  # after each `Session.run`.
  tf_debug: False
  # Name used to identify the run. Data inside `job_dir` will be stored under
  # `run_name`.
  run_name: Stanford/
  # Disables logging and saving checkpoints.
  no_log: False
  # Displays debugging images with results every N steps. Debug mode must be
  # enabled.
  display_every_steps:
  # Display debugging images every N seconds.
  display_every_secs: 300
  # Shuffle the dataset. It should only be disabled when trying to reproduce
  # some problem on some sample.
  random_shuffle: True
  # Save Tensorboard timeline.
  save_timeline: False
  # The frequency, in seconds, that a checkpoint is saved.
  save_checkpoint_secs: 600
  # The frequency, in number of global steps, that the summaries are written to
  # disk.
  save_summaries_steps:
  # The frequency, in seconds, that the summaries are written to disk.  If both
  # save_summaries_steps and save_summaries_secs are set to empty, then the
  # default summary saver isn't used.
  save_summaries_secs: 30
  # Run TensorFlow using full_trace mode for memory and running time logging
  # Debug mode must be enabled.
  full_trace: False
  # Clip gradients by norm, making sure the maximum value is 10.
  clip_by_norm: False
  # Learning rate config.
  learning_rate:
    # Because we're using kwargs, we want the learning_rate dict to be replaced
    # as a whole.
    _replace: True
    # Learning rate decay method; can be: ((empty), 'none', piecewise_constant,
    # exponential_decay, polynomial_decay) You can define different decay
    # methods using `decay_method` and defining all the necessary arguments.
    decay_method:
    learning_rate: 0.0003

  # Optimizer configuration.
  optimizer:
    # Because we're using kwargs, we want the optimizer dict to be replaced as a
    # whole.
    _replace: True
    # Type of optimizer to use (momentum, adam, gradient_descent, rmsprop).
    type: momentum
    # Any options are passed directly to the optimizer as kwarg.
    momentum: 0.9

  # Number of epochs (complete dataset batches) to run.
  num_epochs: 1000

  # Image visualization mode, options = train, eval, debug, (empty).
  # Default=(empty).
  image_vis: train
  # Variable summary visualization mode, options = full, reduced, (empty).
  var_vis:

eval:
  # Image visualization mode, options = train, eval, debug,
  # (empty). Default=(empty).
  image_vis: eval

dataset:
  type: object_detection
  # From which directory to read the dataset.
  dir: 'tf/'
  # Which split of tfrecords to look for.
  split: train
  # Resize image according to min_size and max_size.
  image_preprocessing:
    min_size: 600
    max_size: 1024
  # Data augmentation techniques.
  data_augmentation:
    - flip:
        left_right: True
        up_down: False
        prob: 0.5
    # Also available:
    # # If you resize to too small images, you may end up not having any anchors
    # # that aren't partially outside the image.
    # - resize:
    #     min_size: 600
    #     max_size: 1024
    #     prob: 0.2
    # - patch:
    #     min_height: 600
    #     min_width: 600
    #     prob: 0.2
    # - distortion:
    #     brightness:
    #       max_delta: 0.2
    #     hue:
    #       max_delta: 0.2
    #     saturation:
    #       lower: 0.5
    #       upper: 1.5
    #     prob: 0.3

model:
  type: fasterrcnn
  network:
    # Total number of classes to predict.
    num_classes: 6
    # Use RCNN or just RPN.
    with_rcnn: True

  # Whether to use batch normalization in the model.
  batch_norm: False

  base_network:
    # Which type of pretrained network to use.
    architecture: resnet_v1_50
    # Should we train the pretrained network.
    trainable: True
    # From which file to load the weights.
    weights:
    # Should we download weights if not available.
    download: True
    # Which endpoint layer to use as feature map for network.
    endpoint:
    # Starting point after which all the variables in the base network will be
    # trainable. If not specified, then all the variables in the network will be
    # trainable.
    fine_tune_from: block2
    # Whether to train the ResNet's batch norm layers.
    train_batch_norm: False
    # Whether to use the base network's tail in the RCNN.
    use_tail: True
    # Whether to freeze the base network's tail.
    freeze_tail: False
    # Output stride for ResNet.
    output_stride: 16
    arg_scope:
      # Regularization.
      weight_decay: 0.0005

  loss:
    # Loss weights for calculating the total loss.
    rpn_cls_loss_weight: 1.0
    rpn_reg_loss_weights: 1.0
    rcnn_cls_loss_weight: 1.0
    rcnn_reg_loss_weights: 1.0

  anchors:
    # Base size to use for anchors.
    base_size: 256
    # Scale used for generating anchor sizes.
    scales: [0.25, 0.5, 1, 2]
    # Aspect ratios used for generating anchors.
    ratios: [0.5, 1, 2]
    # Stride depending on feature map size (of pretrained).
    stride: 16

  rpn:
    activation_function: relu6
    l2_regularization_scale: 0.0005  # Disable using 0.
    # Sigma for the smooth L1 regression loss.
    l1_sigma: 3.0
    # Number of filters for the RPN conv layer.
    num_channels: 512
    # Kernel shape for the RPN conv layer.
    kernel_shape: [3, 3]
    # Initializers for RPN weights.
    rpn_initializer:
      _replace: True
      type: random_normal_initializer
      mean: 0.0
      stddev: 0.01
    cls_initializer:
      _replace: True
      type: random_normal_initializer
      mean: 0.0
      stddev: 0.01
    bbox_initializer:
      _replace: True
      type: random_normal_initializer
      mean: 0.0
      stddev: 0.001

    proposals:
      # Total proposals to use before running NMS (sorted by score).
      pre_nms_top_n: 12000
      # Total proposals to use after NMS (sorted by score).
      post_nms_top_n: 2000
      # Option to apply NMS.
      apply_nms: True
      # NMS threshold used when removing "almost duplicates".
      nms_threshold: 0.7
      min_size: 0  # Disable using 0.
      # Run clipping of proposals after running NMS.
      clip_after_nms: False
      # Filter proposals from anchors partially outside the image.
      filter_outside_anchors: False
      # Minimum probability to be used as proposed object.
      min_prob_threshold: 0.0

    target:
      # Margin to crop proposals to close to the border.
      allowed_border: 0
      # Overwrite positives with negative if threshold is too low.
      clobber_positives: False
      # How much IoU with GT proposals must have to be marked as positive.
      foreground_threshold: 0.7
      # High and low thresholds with GT to be considered background.
      background_threshold_high: 0.3
      background_threshold_low: 0.0
      foreground_fraction: 0.5
      # Ration between background and foreground in minibatch.
      minibatch_size: 256
      # Assign to get consistent "random" selection in batch.
      random_seed:  # Only to be used for debugging.

  rcnn:
    layer_sizes: []  # Could be e.g. `[4096, 4096]`.
    dropout_keep_prob: 1.0
    activation_function: relu6
    l2_regularization_scale: 0.0005
    # Sigma for the smooth L1 regression loss.
    l1_sigma: 1.0
    # Use average pooling before the last fully-connected layer.
    use_mean: True
    # Variances to normalize encoded targets with.
    target_normalization_variances: [0.1, 0.2]

    rcnn_initializer:
      _replace: True
      type: variance_scaling_initializer
      factor: 1.0
      uniform: True
      mode: FAN_AVG
    cls_initializer:
      _replace: True
      type: random_normal_initializer
      mean: 0.0
      stddev: 0.01
    bbox_initializer:
      _replace: True
      type: random_normal_initializer
      mean: 0.0
      stddev: 0.001

    roi:
      pooling_mode: crop
      pooled_width: 7
      pooled_height: 7
      padding: VALID

    proposals:
      # Maximum number of detections for each class.
      class_max_detections: 100
      # NMS threshold used to remove "almost duplicate" of the same class.
      class_nms_threshold: 0.5
      # Maximum total detections for an image (sorted by score).
      total_max_detections: 300
      # Minimum prob to be used as proposed object.
      min_prob_threshold: 0.5

    target:
      # Ratio between foreground and background samples in minibatch.
      foreground_fraction: 0.25
      minibatch_size: 256
      # Threshold with GT to be considered positive.
      foreground_threshold: 0.5
      # High and low threshold with GT to be considered negative.
      background_threshold_high: 0.5
      background_threshold_low: 0.0
And result: sonuc Average Precission is terribly bad. where did i make mistake? How can fix this?Please help
tryolabs / luminoth

I get very very low AP values in my data set #214

tryolabs / luminoth

I get very very low AP values ​​in my data set #214

I get very very low AP values in my data set #214