hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.09k stars 727 forks source link

Tensorboard: Continue training curves #56

Closed pathway closed 5 years ago

pathway commented 5 years ago

Currently each time we .learn() it starts a new curve on tensorboard. This makes continuing training (in a loop, or reloading later) difficult to visualize.

I was able to change logic in TensorboardWriter (_get_latest_run_id) to avoid starting a new curve with numbered postfix.

However the global_step is still reset each time, resulting in jumbled curves. image

I would like to avoid starting the timeline from zero. It appears acktr is the only agent type that mentions global_step. Is that the solution for other agent types?

hill-a commented 5 years ago

Hi, thank you for the issue and sorry for the delayed answer.

The issue here is the .learn function will keep track of the step value internally. However in this use case, it might be good to add a keyword argument to explicitly reset the step value for tensorboard when calling .learn.

Will make a fix, and update the documentation in the next few days.

pathway commented 5 years ago

This will be super helpful! If anyone has a pointer or hint on how to explicitly reset the step value for tensorboard I might do it myself. This is a constant issue for me, its like flying blind.

araffin commented 5 years ago

Hello, I think what @hill-a meant was to have a keyword that would allow to reset the global counter, i.e, something like: model.learn(1000, reset_step_counter=True) And yes, you will need to define a global_step_counter variable. Feel free to submit a PR and to ask if you need help ;) (hill-a was supposed to work on that but he seems quite busy right now)

bertram1isu commented 5 years ago

If no one else is working on this, I might take a crack at it. I have a similar issue and was about to hack it in my local version... seems others could use the fix as well. Any objections?

bertram1isu commented 5 years ago

The other thing I'm after is a consistent way to do checkpointing... if I fix this by making the local variables within train part of the class instance variables so that they retain state across training calls, I'm also going to be looking for any other necessary variables so that when the save/load functions are called, I would be able to pick up training where I left off in the spirit of tensorflow checkpointing.

Would you prefer these are addressed in two separate issues?

Also, whatever changes I made I was looking primarily at DDPG... the way the train functions are set up, they seem to be per algorithm... this would mean I'd need to make the same change in each algorithm. Seems like an indicator that there's probably a smarter way. Has anyone given this some thought and have a better suggestion on where to make these changes?

araffin commented 5 years ago

@bertram1isu yes you can work on that ;)

For the other question, it is not super clear to me what you want to do, please make a separate issue.

jrjbertram commented 5 years ago

I ended up finding a workaround to this problem that partially solves it.

The openai baselines code within it's logger module contains support for tensorboard logging. The stable baselinses code still retains this same logging code. You can activate it via:

    from stable_baselines import logger
    print( 'Configuring stable-baselines logger')
    logger.configure()

To control the location where the logs are stores, set the OPENAI_LOGDIR environment variable to a location on your file system. To control the formats of data that are logged (and to enable tensorboard logging), set the OPENAI_LOG_FORMAT environment variable to "stdout,tensorboard".

This form of tensorboard logging does fine across multiple calls to training and yields the same statistics as openai baselines. (Useful for comparing performance across the two forks.)

Here's a comparison of an algorithm running on an environment but with different numbers of timestamps per learning call (1e5, 1e6, 1e9).

image

And a second screenshot of a different part of the tensorboard display: image

These displays show consistent results across multiple calls to train the agent against the environment. (Evident by the sawtooth looking plots in episodes plot.)

More complete snippet that I'm using right now:

basedir = '/some/directory'

try:
    os.makedirs(basedir)
    print("Directory " , basedir ,  " created ")
except FileExistsError:
    pass

os.environ[ 'OPENAI_LOGDIR' ] = basedir
os.environ[ 'OPENAI_LOG_FORMAT' ] = 'stdout,tensorboard'

from stable_baselines import logger
print( 'Configuring stable-baselines logger')
logger.configure()

Full code for reference: https://github.com/jrjbertram/jsbsim_rl/blob/d65d63fe5e3b4e8ac9be580744b0242ab86eafee/compare.py

araffin commented 5 years ago

@jrjbertram thanks for your comment, but I think this issue is more about the new stable-baselines tensorboard logging (used when tensorboard_log is passed), not the legacy one.

RGring commented 5 years ago

I would like to save a status of the model, completely stop the training procedure, and continue at a later point (regarding tensorboard curve). Is that possible at the moment? I guess the timestep needs to be saved and reloaded for num_timesteps.

araffin commented 5 years ago

To answer your question: yes you can already do that but it won't be perfect when training again after loading.

See issue https://github.com/hill-a/stable-baselines/issues/301 and documentation: https://stable-baselines.readthedocs.io/en/master/guide/tensorboard.html

Gaoyuan-Liu commented 3 years ago

Hey @araffin, If I understand right, this issue has already been solve and added to the main branch. So I follow the instruction in Tensorboard Integration, but whatever I put in the tensorboard_log augment, it will create a new folder and start a new log file for tensorboard. My code: model = PPO.load("ppo_panda", env=env, tensorboard_log="./tensorboard/PPO_22") model.set_env(env) model.learn(total_timesteps=5000) model.save("ppo_panda") In my understanding, it should continue and extend the previous tensorboard file, right? Did I miss any steps?

Thanks!

Miffyli commented 3 years ago

@Gaoyuan-Liu I do not think there is a solution merged to SB2, but for SB3. I recommend you try migrating over to SB3 as it is more actively supported and comes with additional fixes.

Gaoyuan-Liu commented 3 years ago

@Miffyli Indeed, I found the function there, thanks!

araffin commented 3 years ago

In my understanding, it should continue and extend the previous tensorboard file, right? Did I miss any steps?

if you look at SB2/SB3 doc, you are missing reset_num_timesteps=False.

Gaoyuan-Liu commented 3 years ago

@araffin True, and I just found even though each time I run model.learn it will create a new folder for tensorboard, which contains new logging data, and the tensorboard plot will be segmented. But if I manually put the binary file into one folder and run tensorboard, it will plot one continuous line with the data from multiple binary files. So it will be more similar to "the training never stopped". Thanks!