Closed jmribeiro closed 1 year ago
What are the conditions that you are retraining under? For example, if you use an EpsilonDecayExplorer you can expect different results after checkpointing because DoubleDQN only saves the models and the optimizer, it does not reset the exploration. So if you just rerun the training script with a checkpoint without adjusting these other factors (like exploration) you may observe different performance, right?
Right on. I'm using the pfrl.explorers.LinearDecayEpsilonGreedy. I'll save and load the state of the explorer as well and conduct the experiment again. Thanks for the quick reply!
Hi again Prabhat.
I've dug deeper into the problem and found the issue. It was not in the Explorer's state. I've opened a pull request ( #184 ) with a proposed solution that fixes it.
The problem was that loading the DQN agent would not load four needed attributes:
My proposed solution (working, but applied only to the DQN agent) was to overwrite the save and load methods on the agent's class:
def save(self, dirname: str) -> None:
super().save(dirname)
torch.save(
self.t, os.path.join(dirname, "t.pt")
)
torch.save(
self.optim_t, os.path.join(dirname, "optim_t.pt")
)
torch.save(
self._cumulative_steps, os.path.join(dirname, "_cumulative_steps.pt")
)
self.replay_buffer.save(
os.path.join(dirname, "replay_buffer.pkl")
)
def load(self, dirname: str) -> None:
super().load(dirname)
self.t = torch.load(
os.path.join(dirname, "t.pt")
)
self.optim_t = torch.load(
os.path.join(dirname, "optim_t.pt")
)
self._cumulative_steps = torch.load(
os.path.join(dirname, "_cumulative_steps.pt")
)
self.replay_buffer.load(
os.path.join(dirname, "replay_buffer.pkl")
)
This change is working as intended, training is resumed properly after reloading the agent from disk:
@jmribeiro, glad to hear you figured it out. I'm no longer an official maintainer of PFRL, so someone else will need to review this.
I can't speak for the maintainers, but it's possible this change may have to just sit on your own branches, since storing the replay buffer can be quite expensive. I think the purpose of the save method is to save the parameters relevant to decision-making, not necessarily the parameters relevant to training.
I can't speak for the maintainers, but it's possible this change may have to just sit on your own branches, since storing the replay buffer can be quite expensive. I think the purpose of the save method is to save the parameters relevant to decision-making, not necessarily the parameters relevant to training.
I'll have to disagree on that!
Saving an agent should save its entire current state and allow for resuming training (not just for evaluating). A large feedforward network model, for example, can also reach high filesizes.
Hi, thanks for raising the issue and opening the PR!
To clarify, what @prabhatnagarajan said is the current design choice of PFRL:
I think the purpose of the save method is to save the parameters relevant to decision-making, not necessarily the parameters relevant to training.
Saving a whole replay buffer typically consumes much more storage. Ideally, users should be able to independently decide if/when to save a model parameters (for later evaluation or deployment) and if/when to save a snapshot of training (for later resuming).
My suggestion: Instead of modifying existing save
and load
methods, how about adding new methods like save_snapshot
and load_snapshot
that handles complete snapshotting to agents so that users can call them manually when they want to resume? That can be easily merged as it would not affect existing code.
I'll have to disagree on that! Saving an agent should save its entire current state and allow for resuming training (not just for evaluating). A large feedforward network model, for example, can also reach high filesizes.
Yes, as @muupan 's message indicated above, I meant that the current design decision is that save
is meant to save the parameters relevant to decision-making.
Hi, thanks for raising the issue and opening the PR!
To clarify, what @prabhatnagarajan said is the current design choice of PFRL:
I think the purpose of the save method is to save the parameters relevant to decision-making, not necessarily the parameters relevant to training.
Saving a whole replay buffer typically consumes much more storage. Ideally, users should be able to independently decide if/when to save a model parameters (for later evaluation or deployment) and if/when to save a snapshot of training (for later resuming).
My suggestion: Instead of modifying existing
save
andload
methods, how about adding new methods likesave_snapshot
andload_snapshot
that handles complete snapshotting to agents so that users can call them manually when they want to resume? That can be easily merged as it would not affect existing code.
I see! I'll update the PR.
I guess it is resolved by https://github.com/pfnet/pfrl/pull/184 .
@jmribeiro thanks for your contribute, i also want to resuming training , so i should call agent.save and then agent.savesnapshot and after reboot ,i should call agent.load and then agent.loadsnapshot right?
before i saw this issue, i save and load replaybuffer ,no t, optim_t and _cumulative_steps, will this way worked correct?
@jmribeiro thanks for your contribute, i also want to resuming training , so i should call agent.save and then agent.savesnapshot and after reboot ,i should call agent.load and then agent.loadsnapshot right?
before i saw this issue, i save and load replaybuffer ,no t, optim_t and _cumulative_steps, will this way worked correct?
Hey @mostcute, you may call save_snapshot and load_snapshot, as they call save and load respectively. No need to call both.
If DDQN is trained without checkpoints for a given number of timesteps, training works as intended (Fig. 1). However, if training is interrupted, the agent saved into disk and then resumed by loading the checkpoint from disk, training doesn't work properly (Fig 2).
With other agents (e.g. PPO), this issue doesn't happen.
Fig 1 - Training without checkpoints (i.e. same program ran from start to finish)
Fig 2 - Training with checkpoint (i.e., program killed at every t steps and agents loaded from disk)