AttributeError: 'AsyncPPOTFPolicy' object has no attribute 'target_model' [rllib]

liying-1997 commented 3 years ago

What is the problem?

AsyncPPOTFPolicy' object has no attribute 'target_model'

Reproduction (REQUIRED)

    import os
    import ray
    from ray.rllib.agents import ppo
    from ray.rllib.agents import impala
    os.environ["OMP_NUM_THREADS"] = "1"
    ray.init()
    change = {
        "env": "PongNoFrameskip-v4",
        "framework": "tensorflow",
        "num_envs_per_worker": 1,
        "num_gpus": 4,
        "num_workers": 8,
        "rollout_fragment_length": 50,
        "train_batch_size": 500,
        "vtrace": True,
    }
    config = impala.ImpalaTrainer.merge_trainer_configs(
        impala.DEFAULT_CONFIG, change, _allow_unknown_configs=True
    )
    trainer = ppo.APPOTrainer(config)
    while True:
        print(trainer.train())

AttributeError: 'AsyncPPOTFPolicy' object has no attribute 'target_model'

[x] I have verified my script runs in a clean environment and reproduces the issue.
[x] I have verified the issue also occurs with the latest wheels.

liying-1997 commented 3 years ago

Error message Traceback (most recent call last): File "test_appo_rllib.py", line 222, in trainer = ppo.APPOTrainer(config) File "xxxxxx/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 106, in init Trainer.init(self, config, env, logger_creator) File "xxxxxx/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 465, in init super().init(config, logger_creator) File "xxxxxx/lib/python3.7/site-packages/ray/tune/trainable.py", line 96, in init self.setup(copy.deepcopy(self.config)) File "xxxxxx/lib/python3.7/site-packages/ray/rllib/agents/trainer.py", line 629, in setup self._init(self.config, self.env_creator) File "xxxxxx/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py", line 140, in _init self.train_exec_impl = execution_plan(self.workers, config) File "xxxxxx/lib/python3.7/site-packages/ray/rllib/agents/impala/impala.py", line 236, in execution_plan learner_thread = make_learner_thread(workers.local_worker(), config) File "xxxxxx/lib/python3.7/site-packages/ray/rllib/agents/impala/impala.py", line 138, in make_learner_thread learner_queue_timeout=config["learner_queue_timeout"]) File "xxxxxx/lib/python3.7/site-packages/ray/rllib/execution/multi_gpu_learner.py", line 106, in init self.policy.copy)) File "xxxxxx/lib/python3.7/site-packages/ray/rllib/execution/multi_gpu_impl.py", line 67, in init self._shared_loss = build_graph(self.loss_inputs) File "xxxxxx/lib/python3.7/site-packages/ray/rllib/policy/dynamic_tf_policy.py", line 371, in copy loss = instance._do_loss_init(input_dict) File "xxxxxx/lib/python3.7/site-packages/ray/rllib/policy/dynamic_tf_policy.py", line 627, in _do_loss_init loss = self._loss_fn(self, self.model, self.dist_class, train_batch) File "xxxxxx/lib/python3.7/site-packages/ray/rllib/agents/ppo/appo_tf_policy.py", line 125, in appo_surrogate_loss target_modelout, = policy.target_model.from_batch(train_batch) AttributeError: 'AsyncPPOTFPolicy' object has no attribute 'target_model'

sven1977 commented 3 years ago

Hey @liyingathere , there were some multi-GPU related problems lately most of which I fixed in this PR here yesterday. https://github.com/ray-project/ray/pull/18017

It's not merged yet (should be merged today), but you could try the latest master plus the changes in this PR. To make sure multi-GPU does not break anymore in the future, we have added nightly multi-GPU learning tests for all major algos (including APPO) and both tf and torch to our pipeline.

ray-project / ray

AttributeError: 'AsyncPPOTFPolicy' object has no attribute 'target_model' [rllib] #18032

What is the problem?

Reproduction (REQUIRED)