question about small object and model_inspect

Ted678Wu commented 3 years ago

Hi there, thanks for the nice script provided first. when I applied efficientdet D4 on my own dataset, I met two questions, does anyone can help me with that:

my image size is 1024 by 1024, and the object size range from 3by3 to 15by15, I tried to set the anchor_scale to 0.25, but finally the result is not very nice, validation dataset AP(0.5:0.95) only around 0.3, is there anything I can do to improve the result?
for the model I got from 1st question, I used model_inspect to check the prediction of same validation dataset as 1st, which contain around 150 ground truth (objects), but only 20 predictions shown on the output image(set the threshold 0.001), in that case, how we can get validation dataset AP(0.5:0.95) around 0.3?

regarding 2nd question, the reason could be the preprocessing and nms? since I didn't find the preprocessing by using model_inspect?
how I can get and save the bbox prediction in 1st question, then maybe I can check the full prediction result.

thanks!

Ted678Wu commented 3 years ago

@fsx950223 @mingxingtan could you guys give me some ideas? thanks in advance.

kartik4949 commented 3 years ago

Export the model to saved_model Used the saved_model and run model_inspect with runmode= saved_model_infer

And outputs will be save in appropriate dir, u can see bboxs in the images saved

Ted678Wu commented 3 years ago

Export the model to saved_model Used the saved_model and run model_inspect with runmode= saved_model_infer

And outputs will be save in appropriate dir, u can see bboxs in the images saved

Hi @kartik4949 Thanks for the reply, yes, I have check the predicted image, since only around 40 images in my validation cohort, I check them one by one, and totally 20 bboxs shown on them, while the ground truth should be 150 bbox

kartik4949 commented 3 years ago

config.yaml file and training command?

Ted678Wu commented 3 years ago

config.yaml file and training command?

training command: python main.py --mode=train_and_eval \ --training_file_pattern="C:/Workspace/1_data/3_Marker/train.tfrecord" \ --validation_file_pattern="C:/Workspace/1_data/3_Marker/val.tfrecord" \ --model_name=efficientdet-d4 \ --model_dir=tmp/model_dir/6_ab_marker_D4_1024_all/ \ --ckpt=efficientdet-d4 \ --train_batch_size=2 \ --eval_batch_size=1 --eval_samples=49 \ --num_examples_per_epoch=1088 --num_epochs=25 \ --hparams=C:/Workspace/5_EfficientDet/automl-master/efficientdet/config.yaml after training, the below config file generated.

infer command: python model_inspect.py --runmode=saved_model_infer \ --model_name=efficientdet-d4 \ --saved_model_dir=tmp/Z_models/6_ab_marker_D4_1024_all \ --output_image_dir=tmp/ \ --input_image="C:/Workspace/1_data/3_Marker/val2018/*.tif" \ --hparams=C:/Workspace/5_EfficientDet/automl-master/efficientdet/tmp/model_dir/6_ab_marker_D4_1024_all/config.yaml

`act_type: swish alpha: 0.25 anchor_scale: 0.25 apply_bn_for_resampling: true aspect_ratios:

1.0
2.0
0.5 augmix_params:
3
-1
1 autoaugment_policy: null backbone_config: null backbone_name: efficientnet-b4 box_class_repeats: 4 box_loss_weight: 1.0 ckpt_var_scope: null clip_gradients_norm: 10.0 conv_after_downsample: false conv_bn_act_pattern: false data_format: channels_last dataset_type: null delta: 0.1 device: grad_ckpting: true grad_ckpting_list:
- Add_
- AddN nvgpu_logging: false drop_remainder: true first_lr_drop_epoch: 200.0 fpn_cell_repeats: 7 fpn_config: null fpn_name: null fpn_num_filters: 224 fpn_weight_method: null gamma: 1.5 heads:
object_detection image_size: !!python/tuple
1024
1024 img_summary_steps: null input_rand_hflip: true iou_loss_type: null iou_loss_weight: 1.0 is_training_bn: true jitter_max: 2.0 jitter_min: 0.1 label_map: 1: Sten 2: Mark 3: Vert label_smoothing: 0.0 learning_rate: 0.08 lr_decay_method: cosine lr_warmup_epoch: 1.0 lr_warmup_init: 0.008 max_instances_per_image: 10 max_level: 7 min_level: 3 mixed_precision: false momentum: 0.9 moving_average_decay: 0.9998 name: efficientdet-d4 nms_configs: iou_thresh: null max_nms_inputs: 0 max_output_size: 100 method: gaussian score_thresh: null sigma: null num_classes: 3 num_epochs: 25 num_scales: 3 optimizer: sgd poly_lr_power: 0.9 positives_momentum: null regenerate_source_id: false sample_image: null second_lr_drop_epoch: 250.0 seg_num_classes: 3 separable_conv: true skip_crowd_during_training: true skip_mismatch: true strategy: null survival_prob: null target_size: null use_augmix: false use_keras_model: true var_freeze_expr: null weight_decay: 4.0e-05`

Ted678Wu commented 3 years ago

config.yaml file and training command?

btw, any idea for the first question, my object is so small....

kartik4949 commented 3 years ago

Increase no of epochs, may be 300

Ted678Wu commented 3 years ago

Increase no of epochs, may be 300

thanks for your idea, any idea for the inference? or do you know how to output the prediction bbox when we do the mAP evaluation?

eval_results = eval_est.evaluate(eval_input_fn, steps=eval_steps)

fsx950223 commented 3 years ago

For small objects, you should change aspect_ratios to values get by https://github.com/google/automl/issues/412.

Ted678Wu commented 3 years ago

For small objects, you should change aspect_ratios to values get by #412.

Hi @fsx950223 ,thanks for the answer, I already use the k-mean to calculate the ration and try with the new ratio, but it didn't help too much, only around AP(0.5:0.95) 5% increase, do you have any other idea? thanks!

BTW: is possible to use multi-gpus training on windows? how to do that? since I met the error message about nccl...

fsx950223 commented 3 years ago

You should try bigger batch size with grad_checkpoint

Ted678Wu commented 3 years ago

You should try bigger batch size with grad_checkpoint

@fsx950223 nice to receive your message quickly, yes, now I am using D4, training batch size 2, grad_checkingpoint True, GTX 1080Ti.

my next idea is possible to use multi-gpu on windows? since I have other GPUs available (on windows system)? in that case, I can increase my batch size.

fsx950223 commented 3 years ago

Only batch_size 2?

Ted678Wu commented 3 years ago

Only batch_size 2?

Yes, only 2, and the GPU usage is 95% for batch size 2

fsx950223 commented 3 years ago

Could you try bigger batch size?

Ted678Wu commented 3 years ago

I tried batch size 4 once, but the OOM message prompt up

fsx950223 commented 3 years ago

Could you try efficientdet-d0 with batch_size 32

Ted678Wu commented 3 years ago

since all my image with size 1024, I have not tried D0 yet, maybe one option of the current situation, can I ask what's the point to try D0 with batch_size 32?

fsx950223 commented 3 years ago

Maybe something wrong with your grad_checkpoint config.

Ted678Wu commented 3 years ago

Maybe something wrong with your grad_checkpoint config.

I see..., I will give it a try later when I can access my computer. thanks for your suggestions, and keep you updated!

Ted678Wu commented 3 years ago

Maybe something wrong with your grad_checkpoint config.

could you give me any information about multi GPU training on windows? really curious about that.

Ted678Wu commented 3 years ago

Could you try efficientdet-d0 with batch_size 32

@fsx950223 I just tried D0 with batch size 32, and grad_checkpoint: True. For such a setting, the training went well, and my GPU setting usage is around 9500MiB/11264MiB (from nvidia-smi)

google / automl

question about small object and model_inspect #882