Loss Decreases during Training, but AP and AR is -1

BhandarkarPawan commented 2 years ago

I am trying to fine-tune the model on the MOT15 dataset found here

I have added a custom script called create_mot15_tfrecord.py to covert the MOT15 ground truth data to TF Records and when I visualise it using the inspect tool, the output seems similar to what I see when I inspect the Pascal VOC TF Records.

Command Line:

python3 dataset/create_mot15_tfrecord.py  --data_dir=data --year=MOT15  --output_path=data/MOT15/tfrecord/train --set=train

python dataset/inspect_tfrecords.py --file_pattern data/MOT15/tfrecord/train00000-of-00100.tfrecord --model_name "efficientdet-d0" --samples 10 --save_samples_dir train_samples_mot/  --hparams="label_map={1:'pedestrian'}, autoaugment_policy=v3"

Here are some samples sample7

sample3 sample2

The mot15_config.yaml file has the following:

num_classes: 2
var_freeze_expr: '(efficientnet|fpn_cells|resample_p6)'
label_map: {1: pedestrian}
lr_warmup_init: 0.08
learning_rate: 0.8
moving_average_decay: 0

Then, I fine-tune the model using the following:

python3 main.py --mode=train_and_eval --train_file_pattern=./data/MOT15/tfrecord/\*.tfrecord  --val_file_pattern=./data/MOT15/tfrecord/\*.tfrecord --model_name=efficientdet-d0 --model_dir=./data/MOT15/models/train-efficientdet-d0-finetune  --ckpt=efficientdet-d0  --train_batch_size=8 --eval_batch_size=8 --num_examples_per_epoch=5700 --num_epochs=5  --val_json_file=./data/MOT15/tfrecord/json_train.json --hparams=mot15_config.yaml --strategy=gpus

However, once the training ends, I see the following lines at the end of the terminal:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = -1.000

I am not sure why this happens, I checked to compare with the other issues and implement suggested fix, but I cannot seem to solve this. Please help.

fsx950223 commented 2 years ago

Please specific eval_samples too. eval_batch_size=64

BhandarkarPawan commented 2 years ago

@fsx950223 Can you please tell me an example for how to use eval_samples? Is it a path or a number?

google / automl

Loss Decreases during Training, but AP and AR is -1 #1119