Hi, I have converted the Stanford Drone Dataset to Pascal VOC format. I then converted it to TFRecords with the command lumi Adapting Dataset. Dataset content : train ==> 40000 images trainval ==> 56000 images val==> 17000 images test ==> 23500 images I have 6 class(pedestrian, biker, skater, car, cart, bus). my config file : train:

Run on debug mode (which enables more logging).

debug: False

Seed for random operations.

seed:

Training batch size for images. FasterRCNN currently only supports 1.

batch_size: 1

Base directory in which model checkpoints & summaries (for Tensorboard) will

be saved.

job_dir: jobs/

Ignore scope when loading from checkpoint (useful when training RPN first

and then RPN + RCNN).

ignore_scope:

Enables TensorFlow debug mode, which stops and lets you analyze Tensors

after each `Session.run`.

tf_debug: False

Name used to identify the run. Data inside `job_dir` will be stored under

`run_name`.

run_name: Stanford/

Disables logging and saving checkpoints.

no_log: False

Displays debugging images with results every N steps. Debug mode must be

enabled.

display_every_steps:

Display debugging images every N seconds.

display_every_secs: 300

Shuffle the dataset. It should only be disabled when trying to reproduce

some problem on some sample.

random_shuffle: True

Save Tensorboard timeline.

save_timeline: False

The frequency, in seconds, that a checkpoint is saved.

save_checkpoint_secs: 600

The frequency, in number of global steps, that the summaries are written to

disk.

save_summaries_steps:

The frequency, in seconds, that the summaries are written to disk. If both

save_summaries_steps and save_summaries_secs are set to empty, then the

default summary saver isn't used.

save_summaries_secs: 30

Run TensorFlow using full_trace mode for memory and running time logging

Debug mode must be enabled.

full_trace: False

Clip gradients by norm, making sure the maximum value is 10.

clip_by_norm: False

Learning rate config.

learning_rate:

Because we're using kwargs, we want the learning_rate dict to be replaced

# as a whole.
_replace: True
# Learning rate decay method; can be: ((empty), 'none', piecewise_constant,
# exponential_decay, polynomial_decay) You can define different decay
# methods using `decay_method` and defining all the necessary arguments.
decay_method:
learning_rate: 0.0003

Optimizer configuration.

optimizer:

Because we're using kwargs, we want the optimizer dict to be replaced as a

# whole.
_replace: True
# Type of optimizer to use (momentum, adam, gradient_descent, rmsprop).
type: momentum
# Any options are passed directly to the optimizer as kwarg.
momentum: 0.9

Number of epochs (complete dataset batches) to run.

num_epochs: 1000

Image visualization mode, options = train, eval, debug, (empty).

Default=(empty).

image_vis: train

Variable summary visualization mode, options = full, reduced, (empty).

var_vis:

eval:

Image visualization mode, options = train, eval, debug,

(empty). Default=(empty).

image_vis: eval

dataset: type: object_detection

From which directory to read the dataset.

dir: 'tf/'

Which split of tfrecords to look for.

split: train

Resize image according to min_size and max_size.

image_preprocessing: min_size: 600 max_size: 1024

Data augmentation techniques.

data_augmentation:

flip: left_right: True up_down: False prob: 0.5
Also available:

If you resize to too small images, you may end up not having any anchors

that aren't partially outside the image.

- resize:

min_size: 600

max_size: 1024

prob: 0.2

- patch:

min_height: 600

min_width: 600

prob: 0.2

- distortion:

brightness:

max_delta: 0.2

hue:

max_delta: 0.2

saturation:

lower: 0.5

upper: 1.5

prob: 0.3

model: type: fasterrcnn network:

Total number of classes to predict.

num_classes: 6
# Use RCNN or just RPN.
with_rcnn: True

Whether to use batch normalization in the model.

batch_norm: False

base_network:

Which type of pretrained network to use.

architecture: resnet_v1_50
# Should we train the pretrained network.
trainable: True
# From which file to load the weights.
weights:
# Should we download weights if not available.
download: True
# Which endpoint layer to use as feature map for network.
endpoint:
# Starting point after which all the variables in the base network will be
# trainable. If not specified, then all the variables in the network will be
# trainable.
fine_tune_from: block2
# Whether to train the ResNet's batch norm layers.
train_batch_norm: False
# Whether to use the base network's tail in the RCNN.
use_tail: True
# Whether to freeze the base network's tail.
freeze_tail: False
# Output stride for ResNet.
output_stride: 16
arg_scope:
  # Regularization.
  weight_decay: 0.0005

loss:

Loss weights for calculating the total loss.

rpn_cls_loss_weight: 1.0
rpn_reg_loss_weights: 1.0
rcnn_cls_loss_weight: 1.0
rcnn_reg_loss_weights: 1.0

anchors:

Base size to use for anchors.

base_size: 256
# Scale used for generating anchor sizes.
scales: [0.25, 0.5, 1, 2]
# Aspect ratios used for generating anchors.
ratios: [0.5, 1, 2]
# Stride depending on feature map size (of pretrained).
stride: 16

rpn: activation_function: relu6 l2_regularization_scale: 0.0005 # Disable using 0.

Sigma for the smooth L1 regression loss.

l1_sigma: 3.0
# Number of filters for the RPN conv layer.
num_channels: 512
# Kernel shape for the RPN conv layer.
kernel_shape: [3, 3]
# Initializers for RPN weights.
rpn_initializer:
  _replace: True
  type: random_normal_initializer
  mean: 0.0
  stddev: 0.01
cls_initializer:
  _replace: True
  type: random_normal_initializer
  mean: 0.0
  stddev: 0.01
bbox_initializer:
  _replace: True
  type: random_normal_initializer
  mean: 0.0
  stddev: 0.001

proposals:
  # Total proposals to use before running NMS (sorted by score).
  pre_nms_top_n: 12000
  # Total proposals to use after NMS (sorted by score).
  post_nms_top_n: 2000
  # Option to apply NMS.
  apply_nms: True
  # NMS threshold used when removing "almost duplicates".
  nms_threshold: 0.7
  min_size: 0  # Disable using 0.
  # Run clipping of proposals after running NMS.
  clip_after_nms: False
  # Filter proposals from anchors partially outside the image.
  filter_outside_anchors: False
  # Minimum probability to be used as proposed object.
  min_prob_threshold: 0.0

target:
  # Margin to crop proposals to close to the border.
  allowed_border: 0
  # Overwrite positives with negative if threshold is too low.
  clobber_positives: False
  # How much IoU with GT proposals must have to be marked as positive.
  foreground_threshold: 0.7
  # High and low thresholds with GT to be considered background.
  background_threshold_high: 0.3
  background_threshold_low: 0.0
  foreground_fraction: 0.5
  # Ration between background and foreground in minibatch.
  minibatch_size: 256
  # Assign to get consistent "random" selection in batch.
  random_seed:  # Only to be used for debugging.

rcnn: layer_sizes: [] # Could be e.g. [4096, 4096]. dropout_keep_prob: 1.0 activation_function: relu6 l2_regularization_scale: 0.0005

Sigma for the smooth L1 regression loss.

l1_sigma: 1.0
# Use average pooling before the last fully-connected layer.
use_mean: True
# Variances to normalize encoded targets with.
target_normalization_variances: [0.1, 0.2]

rcnn_initializer:
  _replace: True
  type: variance_scaling_initializer
  factor: 1.0
  uniform: True
  mode: FAN_AVG
cls_initializer:
  _replace: True
  type: random_normal_initializer
  mean: 0.0
  stddev: 0.01
bbox_initializer:
  _replace: True
  type: random_normal_initializer
  mean: 0.0
  stddev: 0.001

roi:
  pooling_mode: crop
  pooled_width: 7
  pooled_height: 7
  padding: VALID

proposals:
  # Maximum number of detections for each class.
  class_max_detections: 100
  # NMS threshold used to remove "almost duplicate" of the same class.
  class_nms_threshold: 0.5
  # Maximum total detections for an image (sorted by score).
  total_max_detections: 300
  # Minimum prob to be used as proposed object.
  min_prob_threshold: 0.5

target:
  # Ratio between foreground and background samples in minibatch.
  foreground_fraction: 0.25
  minibatch_size: 256
  # Threshold with GT to be considered positive.
  foreground_threshold: 0.5
  # High and low threshold with GT to be considered negative.
  background_threshold_high: 0.5
  background_threshold_low: 0.0

And result : sonuc Average Precission is terribly bad. where did i make mistake? How can fix this?Please help

tryolabs / luminoth

Get very very low results my custom dataset #213

Run on debug mode (which enables more logging).

Seed for random operations.

Training batch size for images. FasterRCNN currently only supports 1.

Base directory in which model checkpoints & summaries (for Tensorboard) will

be saved.

Ignore scope when loading from checkpoint (useful when training RPN first

and then RPN + RCNN).

Enables TensorFlow debug mode, which stops and lets you analyze Tensors

after each Session.run.

Name used to identify the run. Data inside job_dir will be stored under

run_name.

Disables logging and saving checkpoints.

Displays debugging images with results every N steps. Debug mode must be

enabled.

Display debugging images every N seconds.

Shuffle the dataset. It should only be disabled when trying to reproduce

some problem on some sample.

Save Tensorboard timeline.

The frequency, in seconds, that a checkpoint is saved.

The frequency, in number of global steps, that the summaries are written to

disk.

The frequency, in seconds, that the summaries are written to disk. If both

save_summaries_steps and save_summaries_secs are set to empty, then the

default summary saver isn't used.

Run TensorFlow using full_trace mode for memory and running time logging

Debug mode must be enabled.

Clip gradients by norm, making sure the maximum value is 10.

Learning rate config.

Because we're using kwargs, we want the learning_rate dict to be replaced

Optimizer configuration.

Because we're using kwargs, we want the optimizer dict to be replaced as a

Number of epochs (complete dataset batches) to run.

Image visualization mode, options = train, eval, debug, (empty).

Default=(empty).

Variable summary visualization mode, options = full, reduced, (empty).

Image visualization mode, options = train, eval, debug,

(empty). Default=(empty).

From which directory to read the dataset.

Which split of tfrecords to look for.

Resize image according to min_size and max_size.

Data augmentation techniques.

Also available:

If you resize to too small images, you may end up not having any anchors

that aren't partially outside the image.

- resize:

min_size: 600

max_size: 1024

prob: 0.2

- patch:

min_height: 600

min_width: 600

prob: 0.2

- distortion:

brightness:

max_delta: 0.2

hue:

max_delta: 0.2

saturation:

lower: 0.5

upper: 1.5

prob: 0.3

Total number of classes to predict.

Whether to use batch normalization in the model.

Which type of pretrained network to use.

Loss weights for calculating the total loss.

Base size to use for anchors.

Sigma for the smooth L1 regression loss.

Sigma for the smooth L1 regression loss.

after each `Session.run`.

Name used to identify the run. Data inside `job_dir` will be stored under

`run_name`.