tryolabs / luminoth

Deep Learning toolkit for Computer Vision.
https://tryolabs.com
BSD 3-Clause "New" or "Revised" License
2.4k stars 399 forks source link

Get very very low results my custom dataset #213

Closed akorez closed 6 years ago

akorez commented 6 years ago

Hi, I have converted the Stanford Drone Dataset to Pascal VOC format. I then converted it to TFRecords with the command lumi Adapting Dataset. Dataset content : train ==> 40000 images trainval ==> 56000 images val==> 17000 images test ==> 23500 images I have 6 class(pedestrian, biker, skater, car, cart, bus). my config file : train:

Run on debug mode (which enables more logging).

debug: False

Seed for random operations.

seed:

Training batch size for images. FasterRCNN currently only supports 1.

batch_size: 1

Base directory in which model checkpoints & summaries (for Tensorboard) will

be saved.

job_dir: jobs/

Ignore scope when loading from checkpoint (useful when training RPN first

and then RPN + RCNN).

ignore_scope:

Enables TensorFlow debug mode, which stops and lets you analyze Tensors

after each Session.run.

tf_debug: False

Name used to identify the run. Data inside job_dir will be stored under

run_name.

run_name: Stanford/

Disables logging and saving checkpoints.

no_log: False

Displays debugging images with results every N steps. Debug mode must be

enabled.

display_every_steps:

Display debugging images every N seconds.

display_every_secs: 300

Shuffle the dataset. It should only be disabled when trying to reproduce

some problem on some sample.

random_shuffle: True

Save Tensorboard timeline.

save_timeline: False

The frequency, in seconds, that a checkpoint is saved.

save_checkpoint_secs: 600

The frequency, in number of global steps, that the summaries are written to

disk.

save_summaries_steps:

The frequency, in seconds, that the summaries are written to disk. If both

save_summaries_steps and save_summaries_secs are set to empty, then the

default summary saver isn't used.

save_summaries_secs: 30

Run TensorFlow using full_trace mode for memory and running time logging

Debug mode must be enabled.

full_trace: False

Clip gradients by norm, making sure the maximum value is 10.

clip_by_norm: False

Learning rate config.

learning_rate:

Because we're using kwargs, we want the learning_rate dict to be replaced

# as a whole.
_replace: True
# Learning rate decay method; can be: ((empty), 'none', piecewise_constant,
# exponential_decay, polynomial_decay) You can define different decay
# methods using `decay_method` and defining all the necessary arguments.
decay_method:
learning_rate: 0.0003

Optimizer configuration.

optimizer:

Because we're using kwargs, we want the optimizer dict to be replaced as a

# whole.
_replace: True
# Type of optimizer to use (momentum, adam, gradient_descent, rmsprop).
type: momentum
# Any options are passed directly to the optimizer as kwarg.
momentum: 0.9

Number of epochs (complete dataset batches) to run.

num_epochs: 1000

Image visualization mode, options = train, eval, debug, (empty).

Default=(empty).

image_vis: train

Variable summary visualization mode, options = full, reduced, (empty).

var_vis:

eval:

Image visualization mode, options = train, eval, debug,

(empty). Default=(empty).

image_vis: eval

dataset: type: object_detection

From which directory to read the dataset.

dir: 'tf/'

Which split of tfrecords to look for.

split: train

Resize image according to min_size and max_size.

image_preprocessing: min_size: 600 max_size: 1024

Data augmentation techniques.

data_augmentation:

model: type: fasterrcnn network:

Total number of classes to predict.

num_classes: 6
# Use RCNN or just RPN.
with_rcnn: True

Whether to use batch normalization in the model.

batch_norm: False

base_network:

Which type of pretrained network to use.

architecture: resnet_v1_50
# Should we train the pretrained network.
trainable: True
# From which file to load the weights.
weights:
# Should we download weights if not available.
download: True
# Which endpoint layer to use as feature map for network.
endpoint:
# Starting point after which all the variables in the base network will be
# trainable. If not specified, then all the variables in the network will be
# trainable.
fine_tune_from: block2
# Whether to train the ResNet's batch norm layers.
train_batch_norm: False
# Whether to use the base network's tail in the RCNN.
use_tail: True
# Whether to freeze the base network's tail.
freeze_tail: False
# Output stride for ResNet.
output_stride: 16
arg_scope:
  # Regularization.
  weight_decay: 0.0005

loss:

Loss weights for calculating the total loss.

rpn_cls_loss_weight: 1.0
rpn_reg_loss_weights: 1.0
rcnn_cls_loss_weight: 1.0
rcnn_reg_loss_weights: 1.0

anchors:

Base size to use for anchors.

base_size: 256
# Scale used for generating anchor sizes.
scales: [0.25, 0.5, 1, 2]
# Aspect ratios used for generating anchors.
ratios: [0.5, 1, 2]
# Stride depending on feature map size (of pretrained).
stride: 16

rpn: activation_function: relu6 l2_regularization_scale: 0.0005 # Disable using 0.

Sigma for the smooth L1 regression loss.

l1_sigma: 3.0
# Number of filters for the RPN conv layer.
num_channels: 512
# Kernel shape for the RPN conv layer.
kernel_shape: [3, 3]
# Initializers for RPN weights.
rpn_initializer:
  _replace: True
  type: random_normal_initializer
  mean: 0.0
  stddev: 0.01
cls_initializer:
  _replace: True
  type: random_normal_initializer
  mean: 0.0
  stddev: 0.01
bbox_initializer:
  _replace: True
  type: random_normal_initializer
  mean: 0.0
  stddev: 0.001

proposals:
  # Total proposals to use before running NMS (sorted by score).
  pre_nms_top_n: 12000
  # Total proposals to use after NMS (sorted by score).
  post_nms_top_n: 2000
  # Option to apply NMS.
  apply_nms: True
  # NMS threshold used when removing "almost duplicates".
  nms_threshold: 0.7
  min_size: 0  # Disable using 0.
  # Run clipping of proposals after running NMS.
  clip_after_nms: False
  # Filter proposals from anchors partially outside the image.
  filter_outside_anchors: False
  # Minimum probability to be used as proposed object.
  min_prob_threshold: 0.0

target:
  # Margin to crop proposals to close to the border.
  allowed_border: 0
  # Overwrite positives with negative if threshold is too low.
  clobber_positives: False
  # How much IoU with GT proposals must have to be marked as positive.
  foreground_threshold: 0.7
  # High and low thresholds with GT to be considered background.
  background_threshold_high: 0.3
  background_threshold_low: 0.0
  foreground_fraction: 0.5
  # Ration between background and foreground in minibatch.
  minibatch_size: 256
  # Assign to get consistent "random" selection in batch.
  random_seed:  # Only to be used for debugging.

rcnn: layer_sizes: [] # Could be e.g. [4096, 4096]. dropout_keep_prob: 1.0 activation_function: relu6 l2_regularization_scale: 0.0005

Sigma for the smooth L1 regression loss.

l1_sigma: 1.0
# Use average pooling before the last fully-connected layer.
use_mean: True
# Variances to normalize encoded targets with.
target_normalization_variances: [0.1, 0.2]

rcnn_initializer:
  _replace: True
  type: variance_scaling_initializer
  factor: 1.0
  uniform: True
  mode: FAN_AVG
cls_initializer:
  _replace: True
  type: random_normal_initializer
  mean: 0.0
  stddev: 0.01
bbox_initializer:
  _replace: True
  type: random_normal_initializer
  mean: 0.0
  stddev: 0.001

roi:
  pooling_mode: crop
  pooled_width: 7
  pooled_height: 7
  padding: VALID

proposals:
  # Maximum number of detections for each class.
  class_max_detections: 100
  # NMS threshold used to remove "almost duplicate" of the same class.
  class_nms_threshold: 0.5
  # Maximum total detections for an image (sorted by score).
  total_max_detections: 300
  # Minimum prob to be used as proposed object.
  min_prob_threshold: 0.5

target:
  # Ratio between foreground and background samples in minibatch.
  foreground_fraction: 0.25
  minibatch_size: 256
  # Threshold with GT to be considered positive.
  foreground_threshold: 0.5
  # High and low threshold with GT to be considered negative.
  background_threshold_high: 0.5
  background_threshold_low: 0.0

And result : sonuc Average Precission is terribly bad. where did i make mistake? How can fix this?Please help

JuanSeBestia commented 5 years ago

What it's the image?