pfnet / pfrl

PFRL: a PyTorch-based deep reinforcement learning library
MIT License
1.2k stars 157 forks source link

DDQN Loading Issue - training not resuming properly #183

Closed jmribeiro closed 1 year ago

jmribeiro commented 1 year ago

If DDQN is trained without checkpoints for a given number of timesteps, training works as intended (Fig. 1). However, if training is interrupted, the agent saved into disk and then resumed by loading the checkpoint from disk, training doesn't work properly (Fig 2).

With other agents (e.g. PPO), this issue doesn't happen.

Fig 1 - Training without checkpoints (i.e. same program ran from start to finish) plot

Fig 2 - Training with checkpoint (i.e., program killed at every t steps and agents loaded from disk) plot

prabhatnagarajan commented 1 year ago

What are the conditions that you are retraining under? For example, if you use an EpsilonDecayExplorer you can expect different results after checkpointing because DoubleDQN only saves the models and the optimizer, it does not reset the exploration. So if you just rerun the training script with a checkpoint without adjusting these other factors (like exploration) you may observe different performance, right?

jmribeiro commented 1 year ago

Right on. I'm using the pfrl.explorers.LinearDecayEpsilonGreedy. I'll save and load the state of the explorer as well and conduct the experiment again. Thanks for the quick reply!

jmribeiro commented 1 year ago

Hi again Prabhat.

I've dug deeper into the problem and found the issue. It was not in the Explorer's state. I've opened a pull request ( #184 ) with a proposed solution that fixes it.

The problem was that loading the DQN agent would not load four needed attributes:

My proposed solution (working, but applied only to the DQN agent) was to overwrite the save and load methods on the agent's class:

    def save(self, dirname: str) -> None:
        super().save(dirname)
        torch.save(
            self.t, os.path.join(dirname, "t.pt")
        )
        torch.save(
            self.optim_t, os.path.join(dirname, "optim_t.pt")
        )
        torch.save(
            self._cumulative_steps, os.path.join(dirname, "_cumulative_steps.pt")
        )
        self.replay_buffer.save(
            os.path.join(dirname, "replay_buffer.pkl")
        )

    def load(self, dirname: str) -> None:
        super().load(dirname)
        self.t = torch.load(
            os.path.join(dirname, "t.pt")
        )
        self.optim_t = torch.load(
            os.path.join(dirname, "optim_t.pt")
        )
        self._cumulative_steps = torch.load(
            os.path.join(dirname, "_cumulative_steps.pt")
        )
        self.replay_buffer.load(
            os.path.join(dirname, "replay_buffer.pkl")
        )

This change is working as intended, training is resumed properly after reloading the agent from disk:

image

prabhatnagarajan commented 1 year ago

@jmribeiro, glad to hear you figured it out. I'm no longer an official maintainer of PFRL, so someone else will need to review this.

prabhatnagarajan commented 1 year ago

I can't speak for the maintainers, but it's possible this change may have to just sit on your own branches, since storing the replay buffer can be quite expensive. I think the purpose of the save method is to save the parameters relevant to decision-making, not necessarily the parameters relevant to training.

jmribeiro commented 1 year ago

I can't speak for the maintainers, but it's possible this change may have to just sit on your own branches, since storing the replay buffer can be quite expensive. I think the purpose of the save method is to save the parameters relevant to decision-making, not necessarily the parameters relevant to training.

I'll have to disagree on that!

Saving an agent should save its entire current state and allow for resuming training (not just for evaluating). A large feedforward network model, for example, can also reach high filesizes.

muupan commented 1 year ago

Hi, thanks for raising the issue and opening the PR!

To clarify, what @prabhatnagarajan said is the current design choice of PFRL:

I think the purpose of the save method is to save the parameters relevant to decision-making, not necessarily the parameters relevant to training.

Saving a whole replay buffer typically consumes much more storage. Ideally, users should be able to independently decide if/when to save a model parameters (for later evaluation or deployment) and if/when to save a snapshot of training (for later resuming).

My suggestion: Instead of modifying existing save and load methods, how about adding new methods like save_snapshot and load_snapshot that handles complete snapshotting to agents so that users can call them manually when they want to resume? That can be easily merged as it would not affect existing code.

prabhatnagarajan commented 1 year ago

I'll have to disagree on that! Saving an agent should save its entire current state and allow for resuming training (not just for evaluating). A large feedforward network model, for example, can also reach high filesizes.

Yes, as @muupan 's message indicated above, I meant that the current design decision is that save is meant to save the parameters relevant to decision-making.

jmribeiro commented 1 year ago

Hi, thanks for raising the issue and opening the PR!

To clarify, what @prabhatnagarajan said is the current design choice of PFRL:

I think the purpose of the save method is to save the parameters relevant to decision-making, not necessarily the parameters relevant to training.

Saving a whole replay buffer typically consumes much more storage. Ideally, users should be able to independently decide if/when to save a model parameters (for later evaluation or deployment) and if/when to save a snapshot of training (for later resuming).

My suggestion: Instead of modifying existing save and load methods, how about adding new methods like save_snapshot and load_snapshot that handles complete snapshotting to agents so that users can call them manually when they want to resume? That can be easily merged as it would not affect existing code.

I see! I'll update the PR.

muupan commented 1 year ago

I guess it is resolved by https://github.com/pfnet/pfrl/pull/184 .

mostcute commented 1 month ago

@jmribeiro thanks for your contribute, i also want to resuming training , so i should call agent.save and then agent.savesnapshot and after reboot ,i should call agent.load and then agent.loadsnapshot right?

before i saw this issue, i save and load replaybuffer ,no t, optim_t and _cumulative_steps, will this way worked correct?

jmribeiro commented 1 month ago

@jmribeiro thanks for your contribute, i also want to resuming training , so i should call agent.save and then agent.savesnapshot and after reboot ,i should call agent.load and then agent.loadsnapshot right?

before i saw this issue, i save and load replaybuffer ,no t, optim_t and _cumulative_steps, will this way worked correct?

Hey @mostcute, you may call save_snapshot and load_snapshot, as they call save and load respectively. No need to call both.