google / automl

Google Brain AutoML
Apache License 2.0
6.23k stars 1.45k forks source link

[efficietndet] finetune on voc error, metric error #987

Open gdy2021 opened 3 years ago

gdy2021 commented 3 years ago

I follow the instruction of "8. Finetune on PASCAL VOC 2012 with detector COCO ckpt." and encounter errors like this

2021-05-11 12:30:14.771597: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
INFO:tensorflow:Restoring parameters from ./model/efficientdet-d0-finetune-voc/model.ckpt-0
I0511 12:30:14.771730 140500869998336 saver.py:1292] Restoring parameters from ./model/efficientdet-d0-finetune-voc/model.ckpt-0
INFO:tensorflow:Running local_init_op.
I0511 12:30:16.816297 140500869998336 session_manager.py:505] Running local_init_op.
INFO:tensorflow:Done running local_init_op.
I0511 12:30:16.979402 140500869998336 session_manager.py:508] Done running local_init_op.
creating index...
index created!
2021-05-11 12:30:32.279483: W tensorflow/core/framework/op_kernel.cc:1751] Unknown: IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
Traceback (most recent call last):

  File "/data/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/ops/script_ops.py", line 249, in __call__
    ret = func(*args)

  File "/data/anaconda3/envs/tf2/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 620, in wrapper
    return func(*args, **kwargs)

  File "/data/proj/detection/automl/efficientdet/coco_metric.py", line 174, in result
    self.metric_values = self.evaluate(log_level)

  File "/data/proj/detection/automl/efficientdet/coco_metric.py", line 142, in evaluate
    image_ids = list(set(detections[:, 0]))

IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

it seems that the EvaluationMetric.update_state is not called, which makes EvaluationMetric.detections empty. anyone knows why and how to solve this? thank you very much.

achukhrov-ffr-team commented 3 years ago

@gdy2021, have you solved your problem? I have the same issue

td43 commented 3 years ago

I have the same! I changed in the tf2/train.py

if 'train' in FLAGS.mode:

to

if 'train' == FLAGS.mode:

And this is my command:

python3 train.py --mode=traineval \
--model_dir=/home/daniel_tobon/workspace/result \
--eval_samples=500 \
--hparams=/home/daniel_tobon/workspace/tfrecords/hparams_config.yaml \
--val_file_pattern=/home/daniel_tobon/workspace/tfrecords/eval-00000-of-00001.tfrecord

but still get this error:

Traceback (most recent call last):
  File "train.py", line 313, in <module>
    app.run(main)
  File "/home/daniel_tobon/tf-env/lib/python3.8/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/home/daniel_tobon/tf-env/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "train.py", line 295, in main
    eval_results = coco_eval.on_epoch_end(current_epoch)
  File "/home/daniel_tobon/aituring_pipeline_efficientdet/efficientdet_aituring/automl/efficientdet/tf2/../tf2/train_lib.py", line 358, in on_epoch_end
    metrics = self.evaluator.result()
  File "/home/daniel_tobon/aituring_pipeline_efficientdet/efficientdet_aituring/automl/efficientdet/tf2/../coco_metric.py", line 174, in result
    self.metric_values = self.evaluate(log_level)
  File "/home/daniel_tobon/aituring_pipeline_efficientdet/efficientdet_aituring/automl/efficientdet/tf2/../coco_metric.py", line 142, in evaluate
    image_ids = list(set(detections[:, 0]))
IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
td43 commented 3 years ago

@gdy2021 @achukhrov-ffr-team I found out that the problem was with the batch size. I still don know why is this happening, but I tested with:

--backbone_ref=efficientdet-d0
--num_epochs=15
--num_examples_per_epoch=500
--batch_size=6
--eval_samples=1024

And the problem was solved!

Before that, I was using batch size=8.

fsx950223 commented 2 years ago

Maybe you should specific --val_json_file. I have updated the readme, you could try again.

td43 commented 2 years ago

@fsx950223 what if I am using the pascal VOC annotation. Will I need that file as well?

td43 commented 2 years ago

It seems that problem only happens when the validation dataset is too small