ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.97k stars 5.77k forks source link

checkpoint save-restore issues #5666

Closed gabiguetta closed 3 years ago

gabiguetta commented 5 years ago

System information

Describe the problem

During training, using trainer.save() and trainer.restore() right afterwards presents differences between the original and the reconstructed policy. I got to debug it after experiencing : 1.Major drops in the training progress when restoring the trainer from a checkpoint. 2.Evaluating the policy while training gave very different rewards then evaluating it from a checkpoint created at the same time of the first evaluation.

Source code / logs

Code snippet is as follows:

for i in range(100000):
    result = trainer.train()
    if i > 0 and i % 100 == 0:
        policy_orig = trainer.workers.local_worker().get_policy()
        checkpoint = trainer.save()
        trainer.restore(checkpoint)
        policy_restored = trainer.workers.local_worker().get_policy()

When debugging it, putting a breakpoint after policy_orig is created and checking the model output by evaluating: logits = policy.model._forward({'obs': state}, [])[0] on a random state generates values different from the values created by running the same state through policy_restored.

Also, textually dumping policy_orig.get_weights() and policy_restored.get_weights() and running vimdiff raised differences between the two sets of weights.

ericl commented 5 years ago

Can you provide a reproduction script? This is tested by test_checkpoint_restore also.

gabiguetta commented 5 years ago

I'll see what I can do, but do notice that the tests run with default config for the policies, meaning, e.g. that this code is used with default config for A3CTrainer:

def _import_a3c(): from ray.rllib.agents import a3c return a3c.A3CTrainer

Problem rises when actually using config["use_pytorch"]==True and thus using A3CTorchPolicy instead of A3CTFPolicy.

roireshef commented 4 years ago

Hi @ericl, was this ever fixed? I'm now running pytorch, seems successfully, but I might be missing something. Just wanted to make sure, because few months ago I've experienced the same issue.

stale[bot] commented 4 years ago

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

You can always ask for help on our discussion forum or Ray's public slack channel.

stale[bot] commented 3 years ago

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!