hill-a / stable-baselines

A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
http://stable-baselines.readthedocs.io/
MIT License
4.14k stars 723 forks source link

[question] How to access "Callbacks - Accessible Variables" for DQN model? #1106

Closed neonine2 closed 3 years ago

neonine2 commented 3 years ago

Sorry if this is what within the scope of questions that should be asked, first time creating an issue.

Version: stable-baselines-2.10.2a1

In the documentation (master version), under DQN, there is a section called "Callbacks - Accessible Variables". https://stable-baselines.readthedocs.io/en/master/modules/dqn.html#callbacks-accessible-variables These variables are not all attributes of the DQN class, for example, "episode_successes". How can I access something like "episode_successes" in my TensorboardCallback so that it shows on Tensorboard?

Thank you very much.

Miffyli commented 3 years ago

episode_successes variable should be available in the on_step function of callback, as it is defined in the local scope of DQN agent.

neonine2 commented 3 years ago

Thank you @Miffyli , I see that instruction in the documentation as well, but I don't know how to use on_step in my code to get what I want. For example, in the code below, what should I assign to value? I tried to do value = self.on_step(), but that doesn't seem to work.

class TensorboardCallback(BaseCallback):
    def __init__(self, verbose=0):
        self.is_tb_set = False
        super(TensorboardCallback, self).__init__(verbose)

    def _on_step(self) -> bool:
        value = 
        summary = tf.Summary(value=[tf.Summary.Value(tag='episode_successes', simple_value=value)])
        self.locals['writer'].add_summary(summary, self.num_timesteps)
        return True
Miffyli commented 3 years ago

Ah, right, now I see the issue. Documentation is not super clear on how to access the locals information.

You can access those variables via self.locals parameter, i.e. you write value = self.locals[variable_name], where variable_name is the variable you want to read from algorithm (see the list you linked in the original post).

I think documentation could be improved regarding how this information is accessed.

PS: For more up-to-date support, I recommend stable-baselines3 with more refined code and better documentation.

neonine2 commented 3 years ago

Great, thank you very much, I'll have to read more into what the variable looks like, but at least I can access it now.