ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.23k stars 5.81k forks source link

How to access saved policies from checkpoint that were trained in multiagent fashion? #6958

Open houcharlie opened 4 years ago

houcharlie commented 4 years ago

What is your question?

Question in the title. If I run this code:

trainer = PPOTrainer(env=BitcoinEnv, config={
                "num_workers": args.workers,
                "multiagent": {
                    "policies_to_train": policies_to_train,
                    "policies": policies,
                    "policy_mapping_fn": select_policy,
                },
                "env_config": {
                    "max_hidden_block": BLOCKS,
                    "alphas":ALPHA,
                    "gammas":GAMMA,
                    'ep_length':ep_length,
                    'print': False
                }
            })
    trainer.restore('/afs/ece.cmu.edu/usr/charlieh/ray_results/PPO/PPO_BitcoinEnv_0_2020-01-29_00-56-415e4ywyg1/checkpoint_11223/checkpoint-11223')
    print(trainer.get_policy())

Then I get the following output

2020-01-29 17:57:40,099 WARNING worker.py:1268 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
2020-01-29 17:57:40,101 INFO resource_spec.py:205 -- Starting Ray with 16.11 GiB memory available for workers and up to 8.07 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-01-29 17:57:40,429 INFO trainer.py:345 -- Tip: set 'eager': true or the --eager flag to enable TensorFlow eager execution
2020-01-29 17:57:40,671 INFO ppo.py:144 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.
2020-01-29 17:57:43,748 INFO rollout_worker.py:768 -- Built policy map: {'0': <ray.rllib.policy.tf_policy_template.PPOTFPolicy object at 0x7f7e75c1cd30>, '1': <ray.rllib.policy.tf_policy_template.PPOTFPolicy object at 0x7f7b5435a048>}
2020-01-29 17:57:43,748 INFO rollout_worker.py:769 -- Built preprocessor map: {'0': <ray.rllib.models.preprocessors.TupleFlatteningPreprocessor object at 0x7f7e75c1a710>, '1': <ray.rllib.models.preprocessors.TupleFlatteningPreprocessor object at 0x7f7e75bd04a8>}
2020-01-29 17:57:43,748 INFO rollout_worker.py:370 -- Built filter map: {'0': <ray.rllib.utils.filter.NoFilter object at 0x7f7e75c1a550>, '1': <ray.rllib.utils.filter.NoFilter object at 0x7f7e75c1a588>}
2020-01-29 17:57:43,780 WARNING worker.py:348 -- WARNING: Falling back to serializing objects of type <class 'numpy.dtype'> by using pickle. This may be inefficient.
2020-01-29 17:57:43,794 WARNING worker.py:348 -- WARNING: Falling back to serializing objects of type <class 'mtrand.RandomState'> by using pickle. This may be inefficient.
2020-01-29 17:57:44,190 INFO multi_gpu_optimizer.py:93 -- LocalMultiGPUOptimizer devices ['/cpu:0']
2020-01-29 17:57:51,882 INFO trainable.py:102 -- _setup took 11.239 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
2020-01-29 17:57:52,032 INFO trainable.py:358 -- Restored from checkpoint: /afs/ece.cmu.edu/usr/charlieh/ray_results/PPO/PPO_BitcoinEnv_0_2020-01-29_00-56-415e4ywyg1/checkpoint_11223/checkpoint-11223
2020-01-29 17:57:52,032 INFO trainable.py:365 -- Current state after restoring: {'_iteration': 11223, '_timesteps_total': 15712200, '_time_total': 16152.620006799698, '_episodes_total': 157122}
<class 'ray.rllib.policy.tf_policy_template.PPOTFPolicy'>
bash-4.2$ python3 bitcoin_collusion.py
2020-01-29 18:00:09,047 WARNING worker.py:1268 -- WARNING: Not updating worker name since `setproctitle` is not installed. Install this with `pip install setproctitle` (or ray[debug]) to enable monitoring of worker processes.
2020-01-29 18:00:09,049 INFO resource_spec.py:205 -- Starting Ray with 16.11 GiB memory available for workers and up to 8.07 GiB for objects. You can adjust these settings with ray.init(memory=<bytes>, object_store_memory=<bytes>).
2020-01-29 18:00:09,344 INFO trainer.py:345 -- Tip: set 'eager': true or the --eager flag to enable TensorFlow eager execution
2020-01-29 18:00:09,543 INFO ppo.py:144 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.
2020-01-29 18:00:12,676 INFO rollout_worker.py:768 -- Built policy map: {'0': <ray.rllib.policy.tf_policy_template.PPOTFPolicy object at 0x7f5253c29dd8>, '1': <ray.rllib.policy.tf_policy_template.PPOTFPolicy object at 0x7f4f6450f0f0>}
2020-01-29 18:00:12,676 INFO rollout_worker.py:769 -- Built preprocessor map: {'0': <ray.rllib.models.preprocessors.TupleFlatteningPreprocessor object at 0x7f5253c28710>, '1': <ray.rllib.models.preprocessors.TupleFlatteningPreprocessor object at 0x7f5253bdca90>}
2020-01-29 18:00:12,677 INFO rollout_worker.py:370 -- Built filter map: {'0': <ray.rllib.utils.filter.NoFilter object at 0x7f5253c28588>, '1': <ray.rllib.utils.filter.NoFilter object at 0x7f5253c285c0>}
2020-01-29 18:00:12,711 WARNING worker.py:348 -- WARNING: Falling back to serializing objects of type <class 'numpy.dtype'> by using pickle. This may be inefficient.
2020-01-29 18:00:12,718 WARNING worker.py:348 -- WARNING: Falling back to serializing objects of type <class 'mtrand.RandomState'> by using pickle. This may be inefficient.
2020-01-29 18:00:13,138 INFO multi_gpu_optimizer.py:93 -- LocalMultiGPUOptimizer devices ['/cpu:0']
2020-01-29 18:00:20,908 INFO trainable.py:102 -- _setup took 11.441 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
2020-01-29 18:00:21,060 INFO trainable.py:358 -- Restored from checkpoint: /afs/ece.cmu.edu/usr/charlieh/ray_results/PPO/PPO_BitcoinEnv_0_2020-01-29_00-56-415e4ywyg1/checkpoint_11223/checkpoint-11223
2020-01-29 18:00:21,060 INFO trainable.py:365 -- Current state after restoring: {'_iteration': 11223, '_timesteps_total': 15712200, '_time_total': 16152.620006799698, '_episodes_total': 157122}
None

So it is clearly loading something, since it correctly loads the state that I saved from what I can tell. However, the object returned from trainer.get_policy() is None. Am I doing something wrong?

Thanks for all the help!

houcharlie commented 4 years ago

Running ls -lh on the checkpoint directory gives the following:

bash-4.2$ ls -lh
total 1.3M
-rw-r--r--. 1 charlieh charlieh_unix 1.3M Jan 29 05:32 checkpoint-11223
-rw-r--r--. 1 charlieh charlieh_unix  186 Jan 29 05:32 checkpoint-11223.tune_metadata

This seems correct to me; the checkpoint seems to have enough data in it.

ericl commented 4 years ago

You need to pass the policy id to get policy, otherwise it tries to get the "default" policy which may not exist in multiagent.