Open iinc opened 5 years ago
A cleaner way to store mode periodically is to use callbacks, much like in this example. This way you avoid recreating many things (including the TensorboardWriter). Granted, this issue could be fixed and document should highlight the designed way of saving models periodically.
Had this issue too. Adding the missing self.writer.close()
in TensorboardWriter's __exit__
method worked for me, which seems fair. Is there any reason not to do that?
Code is:
def __exit__(self, exc_type, exc_val, exc_tb):
if self.writer is not None:
self.writer.add_graph(self.graph)
self.writer.flush()
self.writer.close()
This is a stretch because I don't know if this is even technically possible given the current data structure, but maybe one reason the close()
isn't added is because the original coders wanted the same writer to be accessible later, and because instantiating a new wrapper rather than reusing the old one would necessarily overwrite or create a new TF events file.
isn't added is because the original coders wanted the same writer to be accessible later,
@hill-a could you comment on that?
I am training an A2C agent and I want to frequently save the model.
The issue I am having is that too many tensorboard files are being opened and never closed. This causes the program to crash as it hits the limit set by my OS.
The A2C agent sets up a TensorboardWriter using a with statement. I would expect the resource would be cleaned up after the with statement is exited.
https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/a2c/a2c.py#L225
But in the
__exit__
method, the tensorboard file is not closed. https://github.com/hill-a/stable-baselines/blob/master/stable_baselines/common/base_class.py#L1056