tensorflow / models

Models and examples built with TensorFlow
Other
76.96k stars 45.79k forks source link

Tensorboard eventfile is very large(80Gb) while training with effecientdet D0 #9052

Open Dhivya-rav opened 4 years ago

Dhivya-rav commented 4 years ago

Prerequisites

Please answer the following questions for yourself before submitting an issue.

1. The entire URL of the file you are using

... https://github.com/tensorflow/models/blob/da23acba8ecb8c0e7c9a83cdb9f10092895c9dcc/research/object_detection/model_main_tf2.py

2. Describe the bug

I am training a single class object detection using EfficientDet D0 512*512 model from tf2 detection zoo. The log file after 200k epochs is more than 80GB and growing. The training steps set in config file is 300k. I noticed such large log files also with SSD mobile net v2 and Faster Rcnn (10GB for 100000 steps). In comparison TF1 training log files for 300000 steps was less than 1gb.

3. Steps to reproduce

Steps to reproduce the behavior.

4. Expected behavior

A clear and concise description of what you expected to happen.

5. Additional context

Include any logs that would be helpful to diagnose the problem.

6. System information

Moritz-Weisenboehler commented 4 years ago

I had a similar problem due to too many visualized training images and suggested a possible solution:

9019.

shehroz010 commented 3 years ago

Is there any solution to this. I have events files generating over 90GB. Any work around to fix this ? If i delete the event files what will the impact on model ?