[rllib] Unity example broken

rusu24edward commented 3 years ago

What is the problem?

I am attempting to follow the local unity example. I have followed the instructions, and I receive an error while attempting to train with python3 unity3d_env_local.py --env SoccerStrikersVsGoalie and press the play button in my unity editor:

  File "/home/eddie/.local/lib/python3.6/site-packages/mlagents_envs/base_env.py", line 405, in _validate_action
    if actions.continuous.shape != _expected_shape:
AttributeError: 'numpy.ndarray' object has no attribute 'continuous'

Ray version and other system information (Python version, TensorFlow version, OS): Python 3.6.9 RLlib 1.2.0 mlagents 0.24.0 Unity 2018.4.32f1 (this is required version to open the ml-agents example projects) tf 2.4.1 Ubuntu 18.04

Reproduction (REQUIRED)

To reproduce, follow the steps in the unity example.

[X] I have verified my script runs in a clean environment and reproduces the issue.
[X] I have verified the issue also occurs with the latest wheels.

For reference, here's the full stacktrace:

WARNING:tensorflow:From /home/eddie/.local/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
2021-03-07 00:20:27,546 INFO services.py:1174 -- View the Ray dashboard at http://127.0.0.1:8265
== Status ==
Memory usage on this node: 3.6/7.7 GiB
Using FIFO scheduling algorithm.
Resources requested: 1/4 CPUs, 0/0 GPUs, 0.0/3.12 GiB heap, 0.0/1.07 GiB objects
Result logdir: /home/eddie/ray_results/PPO
Number of trials: 1/1 (1 RUNNING)

(pid=18535) WARNING:tensorflow:From /home/eddie/.local/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
(pid=18535) Instructions for updating:
(pid=18535) non-resource variables are not supported in the long term
(pid=18535) 2021-03-07 00:20:32,703 INFO trainer.py:616 -- Tip: set framework=tfe or the --eager flag to enable TensorFlow eager execution
(pid=18535) 2021-03-07 00:20:32,704 INFO trainer.py:643 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.
(pid=18535) No game binary provided, will use a running Unity editor instead.
(pid=18535) Make sure you are pressing the Play (|>) button in your editor to start.
(pid=18535) Created UnityEnvironment for port 5004
(pid=18535) 2021-03-07 00:21:13,822 WARNING deprecation.py:34 -- DeprecationWarning: `framestack` has been deprecated. Use `num_framestacks (int)` instead. This will raise an error in the future!
(pid=18535) 2021-03-07 00:21:15,340 WARNING deprecation.py:34 -- DeprecationWarning: `framestack` has been deprecated. Use `num_framestacks (int)` instead. This will raise an error in the future!
(pid=18535) 2021-03-07 00:21:23,532 INFO trainable.py:103 -- Trainable.setup took 50.832 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
(pid=18535) 2021-03-07 00:21:23,532 WARNING util.py:47 -- Install gputil for GPU system monitoring.
(pid=18535) 2021-03-07 00:21:23,678 WARNING deprecation.py:34 -- DeprecationWarning: `env_index` has been deprecated. Use `episode.env_id` instead. This will raise an error in the future!
2021-03-07 00:21:24,378 ERROR trial_runner.py:616 -- Trial PPO_unity3d_fabd0_00000: Error processing event.
Traceback (most recent call last):
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/tune/trial_runner.py", line 586, in _process_trial
    results = self.trial_executor.fetch_result(trial)
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/tune/ray_trial_executor.py", line 609, in fetch_result
    result = ray.get(trial_future[0], timeout=DEFAULT_GET_TIMEOUT)
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/_private/client_mode_hook.py", line 47, in wrapper
    return func(*args, **kwargs)
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/worker.py", line 1456, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(AttributeError): ray::PPO.train_buffered() (pid=18535, ip=192.168.0.11)
  File "python/ray/_raylet.pyx", line 480, in ray._raylet.execute_task
  File "python/ray/_raylet.pyx", line 432, in ray._raylet.execute_task.function_executor
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 167, in train_buffered
    result = self.train()
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 529, in train
    raise e
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/rllib/agents/trainer.py", line 515, in train
    result = Trainable.train(self)
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/tune/trainable.py", line 226, in train
    result = self.step()
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/rllib/agents/trainer_template.py", line 148, in step
    res = next(self.train_exec_impl)
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/util/iter.py", line 756, in __next__
    return next(self.built_iterator)
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/util/iter.py", line 843, in apply_filter
    for item in it:
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/util/iter.py", line 843, in apply_filter
    for item in it:
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  [Previous line repeated 1 more time]
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/util/iter.py", line 876, in apply_flatten
    for item in it:
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/util/iter.py", line 828, in add_wait_hooks
    item = next(it)
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/util/iter.py", line 783, in apply_foreach
    for item in it:
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/rllib/execution/rollout_ops.py", line 69, in sampler
    yield workers.local_worker().sample()
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/rllib/evaluation/rollout_worker.py", line 662, in sample
    batches = [self.input_reader.next()]
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/rllib/evaluation/sampler.py", line 95, in next
    batches = [self.get_data()]
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/rllib/evaluation/sampler.py", line 224, in get_data
    item = next(self.rollout_provider)
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/rllib/evaluation/sampler.py", line 686, in _env_runner
    base_env.send_actions(actions_to_send)
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/rllib/env/base_env.py", line 399, in send_actions
    obs, rewards, dones, infos = env.step(agent_dict)
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/rllib/env/wrappers/unity3d_env.py", line 129, in step
    action_dict[key])
  File "/home/eddie/.local/lib/python3.6/site-packages/mlagents_envs/environment.py", line 366, in set_action_for_agent
    action = action_spec._validate_action(action, None, behavior_name)
  File "/home/eddie/.local/lib/python3.6/site-packages/mlagents_envs/base_env.py", line 405, in _validate_action
    if actions.continuous.shape != _expected_shape:
AttributeError: 'numpy.ndarray' object has no attribute 'continuous'
== Status ==
Memory usage on this node: 4.2/7.7 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/4 CPUs, 0/0 GPUs, 0.0/3.12 GiB heap, 0.0/1.07 GiB objects
Result logdir: /home/eddie/ray_results/PPO
Number of trials: 1/1 (1 ERROR)
Number of errored trials: 1
+-------------------------+--------------+-------------------------------------------------------------------------------------+
| Trial name              |   # failures | error file                                                                          |
|-------------------------+--------------+-------------------------------------------------------------------------------------|
| PPO_unity3d_fabd0_00000 |            1 | /home/eddie/ray_results/PPO/PPO_unity3d_fabd0_00000_0_2021-03-07_00-20-30/error.txt |
+-------------------------+--------------+-------------------------------------------------------------------------------------+

== Status ==
Memory usage on this node: 4.2/7.7 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/4 CPUs, 0/0 GPUs, 0.0/3.12 GiB heap, 0.0/1.07 GiB objects
Result logdir: /home/eddie/ray_results/PPO
Number of trials: 1/1 (1 ERROR)
Number of errored trials: 1
+-------------------------+--------------+-------------------------------------------------------------------------------------+
| Trial name              |   # failures | error file                                                                          |
|-------------------------+--------------+-------------------------------------------------------------------------------------|
| PPO_unity3d_fabd0_00000 |            1 | /home/eddie/ray_results/PPO/PPO_unity3d_fabd0_00000_0_2021-03-07_00-20-30/error.txt |
+-------------------------+--------------+-------------------------------------------------------------------------------------+

Traceback (most recent call last):
  File "unity3d_env_local.py", line 152, in <module>
    restore=args.from_checkpoint)
  File "/home/eddie/.local/lib/python3.6/site-packages/ray/tune/tune.py", line 444, in run
    raise TuneError("Trials did not complete", incomplete_trials)

sven1977 commented 3 years ago

Seems like there was an API change in the mlagents package. The set_action_for_agent now takes an ML-Agents ActionTuple object instead of a simple action (np.array). Looks different in the older mlagents version that I used when I wrote the example script. Let me see, whether I can fix this on the RLlib side. ....

sven1977 commented 3 years ago

Yeah, this was an API change on Unity's end. I can reproduce this now and provide a fix for RLlib ... Thanks for reporting this @rusu24edward !

sven1977 commented 3 years ago

PR: https://github.com/ray-project/ray/pull/14569

rusu24edward commented 3 years ago

Ah, great! Thanks for picking this up so quickly, Sven!

sven1977 commented 3 years ago

Should be merged today or very early next week.

ray-project / ray

[rllib] Unity example broken #14521

What is the problem?

Reproduction (REQUIRED)