google / automl

Google Brain AutoML
Apache License 2.0
6.2k stars 1.44k forks source link

question about small object and model_inspect #882

Open Ted678Wu opened 3 years ago

Ted678Wu commented 3 years ago

Hi there, thanks for the nice script provided first. when I applied efficientdet D4 on my own dataset, I met two questions, does anyone can help me with that:

  1. my image size is 1024 by 1024, and the object size range from 3by3 to 15by15, I tried to set the anchor_scale to 0.25, but finally the result is not very nice, validation dataset AP(0.5:0.95) only around 0.3, is there anything I can do to improve the result?
  2. for the model I got from 1st question, I used model_inspect to check the prediction of same validation dataset as 1st, which contain around 150 ground truth (objects), but only 20 predictions shown on the output image(set the threshold 0.001), in that case, how we can get validation dataset AP(0.5:0.95) around 0.3?

thanks!

Ted678Wu commented 3 years ago

@fsx950223 @mingxingtan could you guys give me some ideas? thanks in advance.

kartik4949 commented 3 years ago

Export the model to saved_model Used the saved_model and run model_inspect with runmode= saved_model_infer

And outputs will be save in appropriate dir, u can see bboxs in the images saved

Ted678Wu commented 3 years ago

Export the model to saved_model Used the saved_model and run model_inspect with runmode= saved_model_infer

And outputs will be save in appropriate dir, u can see bboxs in the images saved

Hi @kartik4949 Thanks for the reply, yes, I have check the predicted image, since only around 40 images in my validation cohort, I check them one by one, and totally 20 bboxs shown on them, while the ground truth should be 150 bbox

kartik4949 commented 3 years ago

config.yaml file and training command?

Ted678Wu commented 3 years ago

config.yaml file and training command?

training command: python main.py --mode=train_and_eval \ --training_file_pattern="C:/Workspace/1_data/3_Marker/train.tfrecord" \ --validation_file_pattern="C:/Workspace/1_data/3_Marker/val.tfrecord" \ --model_name=efficientdet-d4 \ --model_dir=tmp/model_dir/6_ab_marker_D4_1024_all/ \ --ckpt=efficientdet-d4 \ --train_batch_size=2 \ --eval_batch_size=1 --eval_samples=49 \ --num_examples_per_epoch=1088 --num_epochs=25 \ --hparams=C:/Workspace/5_EfficientDet/automl-master/efficientdet/config.yaml after training, the below config file generated.

infer command: python model_inspect.py --runmode=saved_model_infer \ --model_name=efficientdet-d4 \ --saved_model_dir=tmp/Z_models/6_ab_marker_D4_1024_all \ --output_image_dir=tmp/ \ --input_image="C:/Workspace/1_data/3_Marker/val2018/*.tif" \ --hparams=C:/Workspace/5_EfficientDet/automl-master/efficientdet/tmp/model_dir/6_ab_marker_D4_1024_all/config.yaml

`act_type: swish alpha: 0.25 anchor_scale: 0.25 apply_bn_for_resampling: true aspect_ratios:

Ted678Wu commented 3 years ago

config.yaml file and training command?

btw, any idea for the first question, my object is so small....

kartik4949 commented 3 years ago

Increase no of epochs, may be 300

Ted678Wu commented 3 years ago

Increase no of epochs, may be 300

thanks for your idea, any idea for the inference? or do you know how to output the prediction bbox when we do the mAP evaluation?

fsx950223 commented 3 years ago

For small objects, you should change aspect_ratios to values get by https://github.com/google/automl/issues/412.

Ted678Wu commented 3 years ago

For small objects, you should change aspect_ratios to values get by #412.

Hi @fsx950223 ,thanks for the answer, I already use the k-mean to calculate the ration and try with the new ratio, but it didn't help too much, only around AP(0.5:0.95) 5% increase, do you have any other idea? thanks!

BTW: is possible to use multi-gpus training on windows? how to do that? since I met the error message about nccl...

fsx950223 commented 3 years ago

You should try bigger batch size with grad_checkpoint

Ted678Wu commented 3 years ago

You should try bigger batch size with grad_checkpoint

@fsx950223 nice to receive your message quickly, yes, now I am using D4, training batch size 2, grad_checkingpoint True, GTX 1080Ti.

my next idea is possible to use multi-gpu on windows? since I have other GPUs available (on windows system)? in that case, I can increase my batch size.

fsx950223 commented 3 years ago

Only batch_size 2?

Ted678Wu commented 3 years ago

Only batch_size 2?

Yes, only 2, and the GPU usage is 95% for batch size 2

fsx950223 commented 3 years ago

Could you try bigger batch size?

Ted678Wu commented 3 years ago

I tried batch size 4 once, but the OOM message prompt up

fsx950223 commented 3 years ago

Could you try efficientdet-d0 with batch_size 32

Ted678Wu commented 3 years ago

since all my image with size 1024, I have not tried D0 yet, maybe one option of the current situation, can I ask what's the point to try D0 with batch_size 32?

fsx950223 commented 3 years ago

Maybe something wrong with your grad_checkpoint config.

Ted678Wu commented 3 years ago

Maybe something wrong with your grad_checkpoint config.

I see..., I will give it a try later when I can access my computer. thanks for your suggestions, and keep you updated!

Ted678Wu commented 3 years ago

Maybe something wrong with your grad_checkpoint config.

could you give me any information about multi GPU training on windows? really curious about that.

Ted678Wu commented 3 years ago

Could you try efficientdet-d0 with batch_size 32

@fsx950223 I just tried D0 with batch size 32, and grad_checkpoint: True. For such a setting, the training went well, and my GPU setting usage is around 9500MiB/11264MiB (from nvidia-smi)