Closed ghost closed 6 years ago
As I know, there is no easy way to evaluate validation set during training using tf-sllim. If you have extra gpu, try to evaluate on validation set in a different process. Or, reference this issue #5987
Thanks @pudae
Yes, I have access to 4 GPUs right now, but I could not run eval_image_classifier.py
on GPUs.
On the other hand, I was wondering if there is a way to get slim.evaluation.evaluate_once
running in eval_image_classifier.py
for every existence checkpoint in a log_directory. I mean how can we evaluate every available checkpoint in a folder and save the accuracy value of them for plotting? I tried but I get the accuracy value for the latest checkpoint, only.
To evaluate every new checkpoint while training, you can use slim.evaluation.evaluation_loop. In my case, write another script that iterate all checkpoints and call eval_image_classifier.py.
Or, if you want to evaluate all saved checkpoints, just pass all checkpoints to eval_image_classifier.py To get accuracy value, use 'final_op' arguments.
For example...
final_op = [names_to_values['Accuracy']]
final_accuracy = slim.evaluation.evalute_once(
# ...
final_op=final_op)
tf.logging.info('Final accuracy: {}'.format(final_accuracy))
I am trying to use DensNet for regression problem with TF-Slim. My data contains 60000 jpeg images with 37 float labels for each image. I divided my data into three different tfrecords files of a train set (60%), a validation set (20%) and a test set (20%).
I need to evaluate validation set during training loop and make a plot like image. In TF-Slim documentation they just explain train loop and evaluation loop separately. I can just evaluate validation or test set after training loop finished. While as I said I need to evaluate during training.
I tried to use slim.evaluation.evaluation_loop function instead of slim.evaluation.evaluate_once. But it doesn't help.
I tried evaluation.evaluate_repeatedly as well.
In both of these functions, they just read the latest available checkpoint from checkpoint_dir and apparently waiting for the next one, however when the new checkpoints are generated, they don't perform at all.
I use Python 2.7.13 and Tensorflow 1.3.0 on CPU.
Any help will be highly appreciated.