Open shwang opened 4 years ago
Maybe the context flush
es instead of closing because we should be reusing the old Tensorboard FileWriter when possible.
That way we don't create a new FileWriter, therefore a new events file every time we call PPO2.learn(reset_num_timesteps=False)
.
I'm ending up with long and growing list of files like:
├── sb_tb
│ └── PPO2_1
│ ├── events.out.tfevents.1589433242.spinach
│ ├── events.out.tfevents.1589433245.spinach
│ ├── events.out.tfevents.1589433248.spinach
│ ├── events.out.tfevents.1589433250.spinach
│ ├── events.out.tfevents.1589433253.spinach
│ ├── events.out.tfevents.1589433255.spinach
│ ├── events.out.tfevents.1589433257.spinach
│ ├── events.out.tfevents.1589433260.spinach
│ ├── events.out.tfevents.1589433262.spinach
│ └── events.out.tfevents.1589433265.spinach
Granted, I can just rely on the ep reward mean logs from Monitor
and logger.logkv()
which don't use this TensorboardWriter
context, so it's not at all critical for me to activate it.
Hello,
Maybe a duplicate of https://github.com/hill-a/stable-baselines/issues/501 But really sounds like a bug
new_tb_log==False here does not work?
new_tb_log==False here does not work?
There is an issue about that: https://github.com/hill-a/stable-baselines/issues/599#issuecomment-561709799
PPO2 uses a
with TensorboardWriter(...) as writer:
context thatflush
es but doesn't ever close itstf.summary.FileWriter
. This led to (in combination with another problem on my side) a "too many files are opened by this process" error in one of my runs when I calledPPO2.learn()
repeatedly.Maybe the intention here is to allow us to access the same
FileWriter
later, but a second call toPPO2.learn()
in facts opens a new events file and creates a newFileWriter
, which again is not closed by the time thatlearn
exits.Relevant lines in
TensorboardWriter
:https://github.com/hill-a/stable-baselines/blob/6347da3abcb3196f468ab9f46e97c9c2afb8111d/stable_baselines/common/base_class.py#L1137-L1145
https://github.com/hill-a/stable-baselines/blob/6347da3abcb3196f468ab9f46e97c9c2afb8111d/stable_baselines/common/base_class.py#L1161-L1164