tensorflow / models

Models and examples built with TensorFlow
Other
77.05k stars 45.77k forks source link

metrics are not displayed during training #9136

Open DanilKonon opened 4 years ago

DanilKonon commented 4 years ago

Hi

I install tensorflow 2.2, and use efficientdet_d2_coco17_tpu-32. I managed to start training this model with this command:

!python3 ./models/research/object_detection/model_main_tf2.py   \
            --pipeline_config_path=./pipeline.config  \
            --model_dir=./efficient_det  \
            --batch_size=4 \
            --num_train_steps=150_000   --sample_1_of_n_eval_examples=4     --alsologtostderr

But while training it outputs only loss without metrics evaluated on eval set as it was in Tensorflow Object Detection 1.

I0821 09:23:17.842881 140282024908672 model_lib_v2.py:652] Step 16800 per-step time 1.223s loss=0.788
INFO:tensorflow:Step 16900 per-step time 1.076s loss=0.503
I0821 09:25:09.853582 140282024908672 model_lib_v2.py:652] Step 16900 per-step time 1.076s loss=0.503
INFO:tensorflow:Step 17000 per-step time 1.252s loss=1.163
I0821 09:27:01.702962 140282024908672 model_lib_v2.py:652] Step 17000 per-step time 1.252s loss=1.163
INFO:tensorflow:Step 17100 per-step time 1.072s loss=0.916
I0821 09:28:55.677012 140282024908672 model_lib_v2.py:652] Step 17100 per-step time 1.072s loss=0.916
INFO:tensorflow:Step 17200 per-step time 1.138s loss=0.819

How can I see my metrics?

Also, in train folder there are events file. They are initialised in the beginning, and then nothing is happening with them. How can I update events to see model metrics and progress in Tensorboard?

dinis-rodrigues commented 4 years ago

Yup, same issue here. in TF 1 this is works properly, while in TF 2 it does not...

TolgaBkm commented 4 years ago

I also have the same issue. I have to stop the training once in a while, run evaluation manually and resume the training process afterwards.

ecatkins commented 4 years ago

Also hitting the same issue trying to port my code over from TF1 to TF2. I previously was using Weights & Biases to sync to Tensorboard so that I could monitor the progress of training... and now not sure what to do

dinis-rodrigues commented 4 years ago

Just checked related issues, it seems that evaluation while training, as we did in TF 1 is not supported with TF2's model_main_tf2.py

cl886699 commented 4 years ago

me too

qraleq commented 4 years ago

Hi, any update on this issue?

LaraNeves commented 3 years ago

I found this tutorial to be really useful to get evaluation on tensorboard while training the model with TF2. Check that for more details but basically you have to run your model_main_tf2.py script in parallel, one for training with the training dataset the other for evaluating with the validation dataset. You can either use 2 GPUs or if you have only one, use GPU for training, CPU for evaluating - it's explained how in the tutorial.

DanilKonon commented 3 years ago

I understand that we can run in parallel two scripts. But what should I do if I run everything in Colab? I thought most everyone uses Colab here...