Closed yryun closed 4 years ago
Thanks for letting us know, we'll look into this. Could you please also share your config file?
Thanks for replying !
Here is my config file. Image size looks very small, but going well with incepV2, resnet50 and 101. I tried stride 8 and stride 16, but same things happened. Mobilenet with faster r-cnn is really important for my research, please solve this problem.
model {
faster_rcnn {
#mobile16 + size_+ +512depth+ crop10 + achor box2 + proposal 100
num_classes: 2
image_resizer {
fixed_shape_resizer {
height: 288
width: 960
}
}
feature_extractor {
type: "faster_rcnn_mobilenet"
first_stage_features_stride: 16
}
first_stage_anchor_generator {
grid_anchor_generator {
height_stride: 16
width_stride: 16
scales: 0.125
scales: 0.25
scales: 0.5
scales: 1.0
scales: 1.5
scales: 2.0
aspect_ratios: 0.48
aspect_ratios: 0.65
aspect_ratios: 1.09
aspect_ratios: 1.28
aspect_ratios: 1.48
aspect_ratios: 1.70
}
}
first_stage_box_predictor_conv_hyperparams {
op: CONV
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
truncated_normal_initializer {
stddev: 0.00999999977648
}
}
}
first_stage_nms_score_threshold: 0.0
first_stage_nms_iou_threshold: 0.699999988079
first_stage_max_proposals: 100
first_stage_localization_loss_weight: 2.0
first_stage_objectness_loss_weight: 1.0
first_stage_box_predictor_depth: 512
initial_crop_size: 10
maxpool_kernel_size: 2
maxpool_stride: 2
second_stage_box_predictor {
mask_rcnn_box_predictor {
fc_hyperparams {
op: FC
regularizer {
l2_regularizer {
weight: 0.0
}
}
initializer {
variance_scaling_initializer {
factor: 1.0
uniform: true
mode: FAN_AVG
}
}
}
#use_dropout: true
#dropout_keep_probability: 0.8
}
}
second_stage_post_processing {
batch_non_max_suppression {
score_threshold: 0.0
iou_threshold: 0.600000023842
max_detections_per_class: 100
max_total_detections: 100
}
score_converter: SOFTMAX
}
second_stage_localization_loss_weight: 2.0
second_stage_classification_loss_weight: 1.0
}
}
train_config {
batch_size: 8
data_augmentation_options {
random_horizontal_flip {
}
}
data_augmentation_options {
random_crop_pad_image {
}
}
optimizer {
momentum_optimizer {
learning_rate {
manual_step_learning_rate {
initial_learning_rate: 3e-03
schedule {
step: 400000
learning_rate: 3e-04
}
schedule {
step: 1200000
learning_rate: 3e-05
}
schedule {
step: 1500000
learning_rate: 3e-06
}
}
}
momentum_optimizer_value: 0.899999976158
}
use_moving_average: false
}
gradient_clipping_by_norm: 10.0
from_detection_checkpoint: false
num_steps: 1600000
}
train_input_reader {
label_map_path: "/home/yryun/fasterRCNN/models/research/object_detection/data/kitti_label_map_v3.pbtxt"
tf_record_input_reader {
input_path: "/home/yryun/fasterRCNN/models/research/object_detection/yeongro/blackbox_train_v3.record_train.tfrecord"
}
shuffle: true
}
eval_config {
num_examples: 592
metrics_set: "pascal_voc_metrics"
use_moving_averages: false
}
eval_input_reader {
label_map_path: "/home/yryun/fasterRCNN/models/research/object_detection/data/kitti_label_map_v3.pbtxt"
tf_record_input_reader {
input_path: "/home/yryun/fasterRCNN/models/research/object_detection/yeongro/blackbox_test_v3.record_train.tfrecord"
}
shuffle: false
}
We didn't release faster rcnn mobilenet model in the past, so I guess you added the mapping here by yourself?
Yes, I added the mapping. I am using the latest faster rcnn mobilenet code.
@yryun Hi! yryun, is there update, have you solved the problem?
Even though the code for faster rcnn mobilenet model is almost complete but it's not officially ready yet. Contributions are welcomed if anyone is interested in the debugging.
Let me try mobilenet with official KITTI dataset. I will post the result later!
Unfortunately, MobileNet feature extractor with Official KITTI dataset has the same problem. So, it is not about my dataset problem.
May I ask the runtime and accuracy for faster rcnn mobilenet?
@yryun I got the same issue, have you solved this problem? thanks!
Any update on this ?
I am trying to train faster rcnn with pretrained mobilenet_v1 as backbone. Training on single GPU works fine. But when I try to use two GPU's, the replicas of the variables created on the second GPUs fail to load the pretrained weights of mobilenet_v1.
For all variables (Conv2d_0 to Conv2d_13) I get an output similar to the following: Variable [MobilenetV1/Conv2d_0/BatchNorm/beta/replica_1] is not available in checkpoint
Hi There, We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing. If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.
System information
Describe the problem
It seems like training bug with Faster R-CNN Mobilenet. I've been training Faster R-CNN with mobilenet feature extractor for 400k iteration, batch 8, learning rate 3e-03(mentioned in HuangMurphy_2017_Speed,accuracy trade-offs for modern convolutional object detectors). But it's mAP is "zero". I'm using my own dataset and it's going well with InceptionV2, Resnet50, 101. mAP of InceptionV2, Resnet50, 101 is 0.7. So it's not about hyperparameter tuning problem.
Source code / logs
here is loss of inception v2, which has good mAP.
here is loss of mobilenet. Second stage loss is strangely low (loss is zero almost of time)
here is mAP of mobilenet. How can it be a zero?
here is my tensorboard distributions of mobilenet. Compared to Incepction V2, mobilenet has No change in second stage conv2d_12, conv2d_13 - moving_mean, moving_variance. I think it could be a clue of cause.