google / automl

Google Brain AutoML
Apache License 2.0
6.25k stars 1.45k forks source link

The train results with custom dataset #228

Open molyswu opened 4 years ago

molyswu commented 4 years ago

The train results with custom dataset : I0414 11:29:36.998405 140700307654464 estimator.py:2066] Saving dict for global step 5534: AP = -1.0, AP50 = -1.0, AP75 = -1.0, APl = -1.0, APm = -1.0, APs = -1.0, ARl = -1.0, ARm = -1.0, ARmax1 = -1.0, ARmax10 = -1.0, ARmax100 = -1.0, ARs = -1.0, box_loss = 0.0, cls_loss = 0.0035500503, global_step = 5534, loss = 0.5959049 INFO:tensorflow:Saving 'checkpoint_path' summary for global step 5534: /tmp/efficientdet-d0-scratch/model.ckpt-5534 I0414 11:29:37.975285 140700307654464 estimator.py:2127] Saving 'checkpoint_path' summary for global step 5534: /tmp/efficientdet-d0-scratch/model.ckpt-5534 INFO:tensorflow:evaluation_loop marked as finished I0414 11:29:37.976075 140700307654464 error_handling.py:115] evaluation_loop marked as finished I0414 11:29:37.976243 140700307654464 main.py:378] Evaluation results: {'AP': -1.0, 'AP50': -1.0, 'AP75': -1.0, 'APl': -1.0, 'APm': -1.0, 'APs': -1.0, 'ARl': -1.0, 'ARm': -1.0, 'ARmax1': -1.0, 'ARmax10': -1.0, 'ARmax100': -1.0, 'ARs': -1.0, 'box_loss': 0.0, 'cls_loss': 0.0035500503, 'loss': 0.5959049, 'global_step': 5534} Thanks !

mingxingtan commented 4 years ago

What's the latest commit in your codebase? If it is not the latest, could you retry with the latest code?

sourabhyadav commented 4 years ago

@mingxingtan I am also facing same issue. What tensorflow version and code version should I use to solve this error?

vasnakh commented 4 years ago

@mingxingtan I am using the code with latest commit 22ae8e3be44c55d1ca91479c2c828cbc3de05186 (April 26th). My TF version is 2.1.0. When I ran the tutorial with PASCAL dataset and efficientdet-d0 i had values for APs. But when I run with custom dataset and using efficientdet-d4 (number of classes for my dataset is 4) I get all -1.0 as shown below:

Command used to run: python main.py --mode=train_and_eval --training_file_pattern=tfrecord/custom.tfrecord --validation_file_pattern=tfrecord/custom.tfrecord --val_json_file=tfrecord/json_custom.json --model_name=efficientdet-d4 --model_dir=/tmp/efficientdet-d4-scratch --ckpt=efficientdet-d4 --train_batch_size=1 --eval_batch_size=1 --eval_samples=1024 --num_examples_per_epoch=15286 --num_epochs=1 --hparams="use_bfloat16=false,num_classes=4,moving_average_decay=0" --use_tpu=False

Output: Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000 Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000 Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000 INFO:tensorflow:Inference Time : 168.45834s I0428 21:03:06.690589 139718382872384 evaluation.py:273] Inference Time : 168.45834s INFO:tensorflow:Finished evaluation at 2020-04-28-21:03:06 I0428 21:03:06.690888 139718382872384 evaluation.py:276] Finished evaluation at 2020-04-28-21:03:06 INFO:tensorflow:Saving dict for global step 15286: AP = -1.0, AP50 = -1.0, AP75 = -1.0, APl = -1.0, APm = -1.0, APs = -1.0, ARl = -1.0, ARm = -1.0, ARmax1 = -1.0, ARmax10 = -1.0, ARmax100 = -1.0, ARs = -1.0, box_loss = 0.0026594172, cls_loss = 0.38637632, global_step = 15286, loss = 0.63757527 I0428 21:03:06.691102 139718382872384 estimator.py:2053] Saving dict for global step 15286: AP = -1.0, AP50 = -1.0, AP75 = -1.0, APl = -1.0, APm = -1.0, APs = -1.0, ARl = -1.0, ARm = -1.0, ARmax1 = -1.0, ARmax10 = -1.0, ARmax100 = -1.0, ARs = -1.0, box_loss = 0.0026594172, cls_loss = 0.38637632, global_step = 15286, loss = 0.63757527 INFO:tensorflow:Saving 'checkpoint_path' summary for global step 15286: /tmp/efficientdet-d4-scratch/model.ckpt-15286 I0428 21:03:08.292897 139718382872384 estimator.py:2113] Saving 'checkpoint_path' summary for global step 15286: /tmp/efficientdet-d4-scratch/model.ckpt-15286 INFO:tensorflow:evaluation_loop marked as finished I0428 21:03:08.293568 139718382872384 error_handling.py:108] evaluation_loop marked as finished I0428 21:03:08.293697 139718382872384 main.py:385] Evaluation results: {'AP': -1.0, 'AP50': -1.0, 'AP75': -1.0, 'APl': -1.0, 'APm': -1.0, 'APs': -1.0, 'ARl': -1.0, 'ARm': -1.0, 'ARmax1': -1.0, 'ARmax10': -1.0, 'ARmax100': -1.0, 'ARs': -1.0, 'box_loss': 0.0026594172, 'cls_loss': 0.38637632, 'loss': 0.63757527, 'global_step': 15286} INFO:tensorflow:/tmp/efficientdet-d4-scratch/archive/model.ckpt-15286 is not in all_model_checkpoint_paths. Manually adding it. I0428 21:03:08.444581 139718382872384 checkpoint_management.py:95] /tmp/efficientdet-d4-scratch/archive/model.ckpt-15286 is not in all_model_checkpoint_paths. Manually adding it. I0428 21:03:08.445361 139718382872384 utils.py:419] Copying checkpoint /tmp/efficientdet-d4-scratch/model.ckpt-15286 to /tmp/efficientdet-d4-scratch/archive

xiaoxin05 commented 4 years ago

Have you solved this problem?Can you train in a custom data set and get better results ? @vasnakh

vasnakh commented 4 years ago

@xiaoxin05 Yes, I was finally able to do it. In my case I had to make sure TF Records has all the necessary fields. For me efficientdet-d2 was giving me pretty good results.

adesgautam commented 4 years ago

"In my case I had to make sure TF Records has all the necessary fields", which fields are you talking about ? What are all the required fields ?

kartik4949 commented 4 years ago

@adesgautam look for readme tutorial and also you can refer create_tensor in dataset folder.

adesgautam commented 4 years ago

@kartik4949 I did convert my coco format dataset into TF Records using that but during training I am getting a constant loss of 10, it it not increasing nor decreasing. Can you help with this ?

kartik4949 commented 4 years ago

@adesgautam Hey, First try with very small data something like 10-50 images and try to overfit it If still loss is 10, check your tfrecord (check for x1,y1,x2,y2) format , lastly check your dataset itself are the bbox correct. Thanks :)

adesgautam commented 4 years ago

My training dataset is not more than 300 images. And I checked the dataset using labelme. It is fine. For these small number of images, in how many epoch the loss will converge ? I am using efficientdet-d0 and 32 shards.