Closed Peter-P779 closed 1 year ago
Let me look into this.
I ran this script and it keeps on running without any issues. It opens a bunch of Windows with some animations. See the attached screenshot.
My hardware information - 8 CPUs and 16 GB RAM on Azure Windows VM.
The error can't be reproduced on your machine then. At my machine the windows with the carts also open and then i get the error. Is there some log or so i can send for further analysis? The same programm runs perfect on wsl with ubuntu tho.
Laptop Dell G3 15:
Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz 2.59 GHz (12 Core) 16,0 GB RAM NVIDIA GeForce RTX 2060
Os: Windows 11 Home Version: 21H2
I see. Note that I don't have any GPU on my Azure Windows VM. Its Windows 10 Pro 20 H2.
So there might be a serious problem once the cluster gets updated?
A random input, the error message seems to say:
(DQNTrainer pid=7004) ModuleNotFoundError: No module named 'pyglet'
Yeah but that wasn't the reason for the error. I got the error on VizDoom. The cartpole thing is just a simpler setup for error report. Hence i didn't notice the missing package.
Here the updated console output after installing the package.
D:\ML\test_RLlib>call TF_Env/Scripts/activate
2022-05-20 22:50:02,463 INFO services.py:1456 -- View the Ray dashboard at http://127.0.0.1:8265
D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\tune\tune.py:455: UserWarning: Consider boosting PBT performance by enabling `reuse_actors` as well as implementing `reset_config` for Trainable.
warnings.warn(
2022-05-20 22:50:06,949 WARNING trial_runner.py:1489 -- You are trying to access _search_alg interface of TrialRunner in TrialScheduler, which is being restricted. If you believe it is reasonable for your scheduler to access this TrialRunner API, please reach out to Ray team on GitHub. A more strict API access pattern would be enforced starting 1.12s.0
2022-05-20 22:50:07,066 INFO trial_runner.py:803 -- starting DQNTrainer_CartPole-v0_6e434_00000
(DQNTrainer pid=19664) 2022-05-20 22:50:13,392 INFO trainer.py:2295 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
(DQNTrainer pid=19664) 2022-05-20 22:50:13,393 INFO simple_q.py:161 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting `simple_optimizer=True` if this doesn't work for you.
(RolloutWorker pid=16968) 2022-05-20 22:50:19,353 WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=16968) Setting the path for recording to D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06\DQNTrainer_CartPole-v0_6e434_00000_0_2022-05-20_22-50-07\
(RolloutWorker pid=9892) 2022-05-20 22:50:19,416 WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=9892) Setting the path for recording to D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06\DQNTrainer_CartPole-v0_6e434_00000_0_2022-05-20_22-50-07\
(RolloutWorker pid=6604) 2022-05-20 22:50:19,420 WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=6604) Setting the path for recording to D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06\DQNTrainer_CartPole-v0_6e434_00000_0_2022-05-20_22-50-07\
(RolloutWorker pid=16968) 2022-05-20 22:50:21,101 DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=16968) 2022-05-20 22:50:21,103 DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x0000016C16DC81C0>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=16968) 2022-05-20 22:50:21,103 DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=9892) 2022-05-20 22:50:21,101 DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=9892) 2022-05-20 22:50:21,103 DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x0000010B135F81C0>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=9892) 2022-05-20 22:50:21,103 DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=6604) 2022-05-20 22:50:21,101 DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=6604) 2022-05-20 22:50:21,103 DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x0000019D8A1081C0>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=6604) 2022-05-20 22:50:21,103 DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=16968) 2022-05-20 22:50:21,696 INFO tf_policy.py:166 -- TFPolicy (worker=2) running on CPU.
(RolloutWorker pid=9892) 2022-05-20 22:50:21,704 INFO tf_policy.py:166 -- TFPolicy (worker=1) running on CPU.
(RolloutWorker pid=6604) 2022-05-20 22:50:21,704 INFO tf_policy.py:166 -- TFPolicy (worker=3) running on CPU.
(RolloutWorker pid=16968) 2022-05-20 22:50:21,772 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=16968) 2022-05-20 22:50:21,772 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=16968) 2022-05-20 22:50:21,773 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=16968) 2022-05-20 22:50:21,773 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=16968) 2022-05-20 22:50:21,773 INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=9892) 2022-05-20 22:50:21,777 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=9892) 2022-05-20 22:50:21,777 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=9892) 2022-05-20 22:50:21,778 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=9892) 2022-05-20 22:50:21,779 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=9892) 2022-05-20 22:50:21,779 INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=9892) 2022-05-20 22:50:21,780 DEBUG dynamic_tf_policy.py:752 -- Initializing loss function with dummy input:
(RolloutWorker pid=9892)
(RolloutWorker pid=9892) { 'action_dist_inputs': <tf.Tensor 'default_policy_wk1/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=9892) 'action_logp': <tf.Tensor 'default_policy_wk1/action_logp:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892) 'action_prob': <tf.Tensor 'default_policy_wk1/action_prob:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892) 'actions': <tf.Tensor 'default_policy_wk1/action:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=9892) 'agent_index': <tf.Tensor 'default_policy_wk1/agent_index:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892) 'dones': <tf.Tensor 'default_policy_wk1/dones:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892) 'eps_id': <tf.Tensor 'default_policy_wk1/eps_id:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892) 'new_obs': <tf.Tensor 'default_policy_wk1/new_obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=9892) 'obs': <tf.Tensor 'default_policy_wk1/obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=9892) 'prev_actions': <tf.Tensor 'default_policy_wk1/prev_actions:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=9892) 'prev_rewards': <tf.Tensor 'default_policy_wk1/prev_rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892) 'q_values': <tf.Tensor 'default_policy_wk1/q_values:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=9892) 'rewards': <tf.Tensor 'default_policy_wk1/rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892) 't': <tf.Tensor 'default_policy_wk1/t:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892) 'unroll_id': <tf.Tensor 'default_policy_wk1/unroll_id:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892) 'weights': <tf.Tensor 'default_policy_wk1/weights:0' shape=(?,) dtype=float32>}
(RolloutWorker pid=9892)
(RolloutWorker pid=6604) 2022-05-20 22:50:21,776 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=6604) 2022-05-20 22:50:21,777 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=6604) 2022-05-20 22:50:21,777 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=6604) 2022-05-20 22:50:21,778 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=6604) 2022-05-20 22:50:21,778 INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=9892) 2022-05-20 22:50:22,197 DEBUG tf_policy.py:742 -- These tensors were used in the loss functions:
(RolloutWorker pid=9892) { 'action_dist_inputs': <tf.Tensor 'default_policy_wk1/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=9892) 'action_logp': <tf.Tensor 'default_policy_wk1/action_logp:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892) 'action_prob': <tf.Tensor 'default_policy_wk1/action_prob:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892) 'actions': <tf.Tensor 'default_policy_wk1/action:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=9892) 'dones': <tf.Tensor 'default_policy_wk1/dones:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892) 'new_obs': <tf.Tensor 'default_policy_wk1/new_obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=9892) 'obs': <tf.Tensor 'default_policy_wk1/obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=9892) 'q_values': <tf.Tensor 'default_policy_wk1/q_values:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=9892) 'rewards': <tf.Tensor 'default_policy_wk1/rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892) 'weights': <tf.Tensor 'default_policy_wk1/weights:0' shape=(?,) dtype=float32>}
(RolloutWorker pid=9892)
(DQNTrainer pid=19664) 2022-05-20 22:50:22,455 INFO worker_set.py:154 -- Inferred observation/action spaces from remote worker (local worker has no env): {'default_policy': (Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), Discrete(2)), '__env__': (Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), Discrete(2))}
(RolloutWorker pid=16968) 2022-05-20 22:50:22,440 DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x0000016C1E6D9400> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(RolloutWorker pid=9892) 2022-05-20 22:50:22,440 DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x0000010B1B1A9400> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(RolloutWorker pid=6604) 2022-05-20 22:50:22,424 DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x0000019D91B5A400> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(DQNTrainer pid=19664) 2022-05-20 22:50:22,518 DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(DQNTrainer pid=19664) 2022-05-20 22:50:22,518 DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000001D96D6951F0>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(DQNTrainer pid=19664) 2022-05-20 22:50:22,518 DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(DQNTrainer pid=19664) 2022-05-20 22:50:22,996 INFO tf_policy.py:166 -- TFPolicy (worker=local) running on CPU.
(DQNTrainer pid=19664) 2022-05-20 22:50:23,034 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(DQNTrainer pid=19664) 2022-05-20 22:50:23,034 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(DQNTrainer pid=19664) 2022-05-20 22:50:23,034 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(DQNTrainer pid=19664) 2022-05-20 22:50:23,034 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(DQNTrainer pid=19664) 2022-05-20 22:50:23,034 INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(DQNTrainer pid=19664) 2022-05-20 22:50:23,050 DEBUG dynamic_tf_policy.py:752 -- Initializing loss function with dummy input:
(DQNTrainer pid=19664)
(DQNTrainer pid=19664) { 'action_dist_inputs': <tf.Tensor 'default_policy/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(DQNTrainer pid=19664) 'action_logp': <tf.Tensor 'default_policy/action_logp:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664) 'action_prob': <tf.Tensor 'default_policy/action_prob:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664) 'actions': <tf.Tensor 'default_policy/action:0' shape=(?,) dtype=int64>,
(DQNTrainer pid=19664) 'agent_index': <tf.Tensor 'default_policy/agent_index:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664) 'dones': <tf.Tensor 'default_policy/dones:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664) 'eps_id': <tf.Tensor 'default_policy/eps_id:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664) 'new_obs': <tf.Tensor 'default_policy/new_obs:0' shape=(?, 4) dtype=float32>,
(DQNTrainer pid=19664) 'obs': <tf.Tensor 'default_policy/obs:0' shape=(?, 4) dtype=float32>,
(DQNTrainer pid=19664) 'prev_actions': <tf.Tensor 'default_policy/prev_actions:0' shape=(?,) dtype=int64>,
(DQNTrainer pid=19664) 'prev_rewards': <tf.Tensor 'default_policy/prev_rewards:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664) 'q_values': <tf.Tensor 'default_policy/q_values:0' shape=(?, 2) dtype=float32>,
(DQNTrainer pid=19664) 'rewards': <tf.Tensor 'default_policy/rewards:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664) 't': <tf.Tensor 'default_policy/t:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664) 'unroll_id': <tf.Tensor 'default_policy/unroll_id:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664) 'weights': <tf.Tensor 'default_policy/weights:0' shape=(?,) dtype=float32>}
(DQNTrainer pid=19664)
(DQNTrainer pid=19664) 2022-05-20 22:50:23,416 DEBUG tf_policy.py:742 -- These tensors were used in the loss functions:
(DQNTrainer pid=19664) { 'action_dist_inputs': <tf.Tensor 'default_policy/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(DQNTrainer pid=19664) 'action_logp': <tf.Tensor 'default_policy/action_logp:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664) 'action_prob': <tf.Tensor 'default_policy/action_prob:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664) 'actions': <tf.Tensor 'default_policy/action:0' shape=(?,) dtype=int64>,
(DQNTrainer pid=19664) 'dones': <tf.Tensor 'default_policy/dones:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664) 'new_obs': <tf.Tensor 'default_policy/new_obs:0' shape=(?, 4) dtype=float32>,
(DQNTrainer pid=19664) 'obs': <tf.Tensor 'default_policy/obs:0' shape=(?, 4) dtype=float32>,
(DQNTrainer pid=19664) 'q_values': <tf.Tensor 'default_policy/q_values:0' shape=(?, 2) dtype=float32>,
(DQNTrainer pid=19664) 'rewards': <tf.Tensor 'default_policy/rewards:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664) 'weights': <tf.Tensor 'default_policy/weights:0' shape=(?,) dtype=float32>}
(DQNTrainer pid=19664)
(DQNTrainer pid=19664) 2022-05-20 22:50:23,619 INFO rollout_worker.py:1727 -- Built policy map: {}
(DQNTrainer pid=19664) 2022-05-20 22:50:23,619 INFO rollout_worker.py:1728 -- Built preprocessor map: {'default_policy': <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000001D96D6951F0>}
(DQNTrainer pid=19664) 2022-05-20 22:50:23,619 INFO rollout_worker.py:666 -- Built filter map: {'default_policy': <ray.rllib.utils.filter.NoFilter object at 0x000001D974F91310>}
(DQNTrainer pid=19664) 2022-05-20 22:50:23,619 DEBUG rollout_worker.py:779 -- Created rollout worker with env None (None), policies {}
== Status ==
Current time: 2022-05-20 22:50:23 (running for 00:00:16.72)
Memory usage on this node: 10.6/15.8 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 4.0/12 CPUs, 0/1 GPUs, 0.0/4.55 GiB heap, 0.0/2.28 GiB objects
Result logdir: D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06
Number of trials: 3/3 (2 PENDING, 1 RUNNING)
+------------------------------------+----------+-----------------+----------+-------------+
| Trial name | status | loc | gamma | lr |
|------------------------------------+----------+-----------------+----------+-------------|
| DQNTrainer_CartPole-v0_6e434_00000 | RUNNING | 127.0.0.1:19664 | 0.901065 | 0.000687763 |
| DQNTrainer_CartPole-v0_6e434_00001 | PENDING | | 0.952011 | 0.000508342 |
| DQNTrainer_CartPole-v0_6e434_00002 | PENDING | | 0.922938 | 0.00096638 |
+------------------------------------+----------+-----------------+----------+-------------+
2022-05-20 22:50:23,650 INFO trial_runner.py:803 -- starting DQNTrainer_CartPole-v0_6e434_00001
(DQNTrainer pid=19664) 2022-05-20 22:50:23,634 INFO trainable.py:152 -- Trainable.setup took 10.243 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
(DQNTrainer pid=19664) 2022-05-20 22:50:23,634 WARNING util.py:60 -- Install gputil for GPU system monitoring.
(RolloutWorker pid=16968) 2022-05-20 22:50:23,681 DEBUG sampler.py:609 -- No episode horizon specified, setting it to Env's limit (200).
(RolloutWorker pid=9892) 2022-05-20 22:50:23,681 INFO rollout_worker.py:809 -- Generating sample batch of size 4
(RolloutWorker pid=9892) 2022-05-20 22:50:23,681 DEBUG sampler.py:609 -- No episode horizon specified, setting it to Env's limit (200).
(RolloutWorker pid=6604) 2022-05-20 22:50:23,681 DEBUG sampler.py:609 -- No episode horizon specified, setting it to Env's limit (200).
(RolloutWorker pid=9892) 2022-05-20 22:50:24,892 INFO sampler.py:672 -- Raw obs from env: { 0: { 'agent0': np.ndarray((4,), dtype=float32, min=-0.039, max=-0.006, mean=-0.029)}}
(RolloutWorker pid=9892) 2022-05-20 22:50:24,892 INFO sampler.py:673 -- Info return from env: {0: {'agent0': None}}
(RolloutWorker pid=9892) 2022-05-20 22:50:24,892 INFO sampler.py:908 -- Preprocessed obs: np.ndarray((4,), dtype=float32, min=-0.039, max=-0.006, mean=-0.029)
(RolloutWorker pid=9892) 2022-05-20 22:50:24,893 INFO sampler.py:913 -- Filtered obs: np.ndarray((4,), dtype=float32, min=-0.039, max=-0.006, mean=-0.029)
(RolloutWorker pid=9892) 2022-05-20 22:50:24,894 INFO sampler.py:1143 -- Inputs to compute_actions():
(RolloutWorker pid=9892)
(RolloutWorker pid=9892) { 'default_policy': [ { 'data': { 'agent_id': 'agent0',
(RolloutWorker pid=9892) 'env_id': 0,
(RolloutWorker pid=9892) 'info': None,
(RolloutWorker pid=9892) 'obs': np.ndarray((4,), dtype=float32, min=-0.039, max=-0.006, mean=-0.029),
(RolloutWorker pid=9892) 'prev_action': None,
(RolloutWorker pid=9892) 'prev_reward': None,
(RolloutWorker pid=9892) 'rnn_state': None},
(RolloutWorker pid=9892) 'type': 'PolicyEvalData'}]}
(RolloutWorker pid=9892)
(RolloutWorker pid=9892) 2022-05-20 22:50:24,895 INFO tf_run_builder.py:98 -- Executing TF run without tracing. To dump TF timeline traces to disk, set the TF_TIMELINE_DIR environment variable.
(RolloutWorker pid=9892) 2022-05-20 22:50:24,982 INFO sampler.py:1169 -- Outputs of compute_actions():
(RolloutWorker pid=9892)
(RolloutWorker pid=9892) { 'default_policy': ( np.ndarray((1,), dtype=int64, min=1.0, max=1.0, mean=1.0),
(RolloutWorker pid=9892) [],
(RolloutWorker pid=9892) { 'action_dist_inputs': np.ndarray((1, 2), dtype=float32, min=-0.038, max=0.044, mean=0.003),
(RolloutWorker pid=9892) 'action_logp': np.ndarray((1,), dtype=float32, min=0.0, max=0.0, mean=0.0),
(RolloutWorker pid=9892) 'action_prob': np.ndarray((1,), dtype=float32, min=1.0, max=1.0, mean=1.0),
(RolloutWorker pid=9892) 'q_values': np.ndarray((1, 2), dtype=float32, min=-0.038, max=0.044, mean=0.003)})}
(RolloutWorker pid=9892)
(DQNTrainer pid=19664) 2022-05-20 22:50:25,352 INFO replay_buffer.py:47 -- Estimated max memory usage for replay buffer is 0.00305 GB (50000.0 batches of size 1, 61 bytes each), available system memory is 16.929984512 GB
(RolloutWorker pid=9892) 2022-05-20 22:50:25,340 INFO simple_list_collector.py:904 -- Trajectory fragment after postprocess_trajectory():
(RolloutWorker pid=9892)
(RolloutWorker pid=9892) { 'agent0': { 'actions': np.ndarray((4,), dtype=int64, min=0.0, max=1.0, mean=0.5),
(RolloutWorker pid=9892) 'agent_index': np.ndarray((4,), dtype=int32, min=0.0, max=0.0, mean=0.0),
(RolloutWorker pid=9892) 'dones': np.ndarray((4,), dtype=bool, min=0.0, max=0.0, mean=0.0),
(RolloutWorker pid=9892) 'eps_id': np.ndarray((4,), dtype=int32, min=1734707724.0, max=1734707724.0, mean=1734707724.0),
(RolloutWorker pid=9892) 'infos': np.ndarray((4,), dtype=object, head={}),
(RolloutWorker pid=9892) 'new_obs': np.ndarray((4, 4), dtype=float32, min=-0.615, max=0.353, mean=-0.063),
(RolloutWorker pid=9892) 'obs': np.ndarray((4, 4), dtype=float32, min=-0.615, max=0.353, mean=-0.059),
(RolloutWorker pid=9892) 'rewards': np.ndarray((4,), dtype=float32, min=1.0, max=1.0, mean=1.0),
(RolloutWorker pid=9892) 'unroll_id': np.ndarray((4,), dtype=int32, min=0.0, max=0.0, mean=0.0),
(RolloutWorker pid=9892) 'weights': np.ndarray((4,), dtype=float32, min=1.0, max=1.0, mean=1.0)}}
(RolloutWorker pid=9892)
(RolloutWorker pid=9892) 2022-05-20 22:50:25,341 INFO rollout_worker.py:854 -- Completed sample batch:
(RolloutWorker pid=9892)
(RolloutWorker pid=9892) { 'actions': np.ndarray((4,), dtype=int64, min=0.0, max=1.0, mean=0.5),
(RolloutWorker pid=9892) 'agent_index': np.ndarray((4,), dtype=int32, min=0.0, max=0.0, mean=0.0),
(RolloutWorker pid=9892) 'dones': np.ndarray((4,), dtype=bool, min=0.0, max=0.0, mean=0.0),
(RolloutWorker pid=9892) 'eps_id': np.ndarray((4,), dtype=int32, min=1734707724.0, max=1734707724.0, mean=1734707724.0),
(RolloutWorker pid=9892) 'new_obs': np.ndarray((4, 4), dtype=float32, min=-0.615, max=0.353, mean=-0.063),
(RolloutWorker pid=9892) 'obs': np.ndarray((4, 4), dtype=float32, min=-0.615, max=0.353, mean=-0.059),
(RolloutWorker pid=9892) 'rewards': np.ndarray((4,), dtype=float32, min=1.0, max=1.0, mean=1.0),
(RolloutWorker pid=9892) 'unroll_id': np.ndarray((4,), dtype=int32, min=0.0, max=0.0, mean=0.0),
(RolloutWorker pid=9892) 'weights': np.ndarray((4,), dtype=float32, min=1.0, max=1.0, mean=1.0)}
(RolloutWorker pid=9892)
(DQNTrainer pid=13672) 2022-05-20 22:50:31,174 INFO trainer.py:2295 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
(DQNTrainer pid=13672) 2022-05-20 22:50:31,174 INFO simple_q.py:161 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting `simple_optimizer=True` if this doesn't work for you.
(pid=) [2022-05-20 22:50:34,744 E 16452 19288] (raylet.exe) agent_manager.cc:107: The raylet exited immediately because the Ray agent failed. The raylet fate shares with the agent. This can happen because the Ray agent was unexpectedly killed or failed. See `dashboard_agent.log` for the root cause.
(DQNTrainer pid=19664) Stack (most recent call first):
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 362 in get_objects
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 1803 in get
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\client_mode_hook.py", line 105 in wrapper
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 492 in base_iterator
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 807 in apply_foreach
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 807 in apply_foreach
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 807 in apply_foreach
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 807 in apply_foreach
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 779 in __next__
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 1108 in build_union
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 869 in apply_filter
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 807 in apply_foreach
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 869 in apply_filter
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 869 in apply_filter
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 807 in apply_foreach
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 779 in __next__
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\agents\trainer.py", line 2174 in _exec_plan_or_training_iteration_fn
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462 in _resume_span
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\agents\trainer.py", line 1155 in step_attempt
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462 in _resume_span
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\agents\trainer.py", line 1074 in step
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462 in _resume_span
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\tune\trainable.py", line 349 in train
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462 in _resume_span
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\function_manager.py", line 701 in actor_method_executor
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(DQNTrainer pid=19664) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(pid=) 2022-05-20 22:50:34,963 INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=12 --runtime-env-hash=2135802228
(pid=) 2022-05-20 22:50:35,277 INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=13 --runtime-env-hash=2135802228
(pid=) 2022-05-20 22:50:35,371 INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=14 --runtime-env-hash=2135802228
(pid=) 2022-05-20 22:50:35,434 INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=15 --runtime-env-hash=2135802228
(RolloutWorker pid=16900) 2022-05-20 22:50:39,099 WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=16900) Setting the path for recording to D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06\DQNTrainer_CartPole-v0_6e434_00001_1_2022-05-20_22-50-23\
(RolloutWorker pid=13860) 2022-05-20 22:50:39,108 WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=13860) Setting the path for recording to D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06\DQNTrainer_CartPole-v0_6e434_00001_1_2022-05-20_22-50-23\
(RolloutWorker pid=8672) 2022-05-20 22:50:39,072 WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=8672) Setting the path for recording to D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06\DQNTrainer_CartPole-v0_6e434_00001_1_2022-05-20_22-50-23\
(RolloutWorker pid=16900) 2022-05-20 22:50:39,934 DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=16900) 2022-05-20 22:50:39,934 DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000001B2406E91C0>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=16900) 2022-05-20 22:50:39,942 DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=13860) 2022-05-20 22:50:39,934 DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=13860) 2022-05-20 22:50:39,934 DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000002451DAD81C0>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=13860) 2022-05-20 22:50:39,934 DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=8672) 2022-05-20 22:50:39,934 DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=8672) 2022-05-20 22:50:39,934 DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000001752E4491C0>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=8672) 2022-05-20 22:50:39,934 DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=16900) 2022-05-20 22:50:40,593 INFO tf_policy.py:166 -- TFPolicy (worker=1) running on CPU.
(RolloutWorker pid=16900) 2022-05-20 22:50:40,672 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=16900) 2022-05-20 22:50:40,672 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=16900) 2022-05-20 22:50:40,672 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=16900) 2022-05-20 22:50:40,672 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=16900) 2022-05-20 22:50:40,672 INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=16900) 2022-05-20 22:50:40,672 DEBUG dynamic_tf_policy.py:752 -- Initializing loss function with dummy input:
(RolloutWorker pid=16900)
(RolloutWorker pid=16900) { 'action_dist_inputs': <tf.Tensor 'default_policy_wk1/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=16900) 'action_logp': <tf.Tensor 'default_policy_wk1/action_logp:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900) 'action_prob': <tf.Tensor 'default_policy_wk1/action_prob:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900) 'actions': <tf.Tensor 'default_policy_wk1/action:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=16900) 'agent_index': <tf.Tensor 'default_policy_wk1/agent_index:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900) 'dones': <tf.Tensor 'default_policy_wk1/dones:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900) 'eps_id': <tf.Tensor 'default_policy_wk1/eps_id:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900) 'new_obs': <tf.Tensor 'default_policy_wk1/new_obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=16900) 'obs': <tf.Tensor 'default_policy_wk1/obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=16900) 'prev_actions': <tf.Tensor 'default_policy_wk1/prev_actions:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=16900) 'prev_rewards': <tf.Tensor 'default_policy_wk1/prev_rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900) 'q_values': <tf.Tensor 'default_policy_wk1/q_values:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=16900) 'rewards': <tf.Tensor 'default_policy_wk1/rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900) 't': <tf.Tensor 'default_policy_wk1/t:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900) 'unroll_id': <tf.Tensor 'default_policy_wk1/unroll_id:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900) 'weights': <tf.Tensor 'default_policy_wk1/weights:0' shape=(?,) dtype=float32>}
(RolloutWorker pid=16900)
(RolloutWorker pid=13860) 2022-05-20 22:50:40,593 INFO tf_policy.py:166 -- TFPolicy (worker=3) running on CPU.
(RolloutWorker pid=13860) 2022-05-20 22:50:40,672 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=13860) 2022-05-20 22:50:40,672 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=13860) 2022-05-20 22:50:40,672 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=13860) 2022-05-20 22:50:40,672 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=13860) 2022-05-20 22:50:40,672 INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=8672) 2022-05-20 22:50:40,609 INFO tf_policy.py:166 -- TFPolicy (worker=2) running on CPU.
(RolloutWorker pid=8672) 2022-05-20 22:50:40,672 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=8672) 2022-05-20 22:50:40,687 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=8672) 2022-05-20 22:50:40,687 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=8672) 2022-05-20 22:50:40,687 INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=8672) 2022-05-20 22:50:40,687 INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=16900) 2022-05-20 22:50:41,144 DEBUG tf_policy.py:742 -- These tensors were used in the loss functions:
(RolloutWorker pid=16900) { 'action_dist_inputs': <tf.Tensor 'default_policy_wk1/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=16900) 'action_logp': <tf.Tensor 'default_policy_wk1/action_logp:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900) 'action_prob': <tf.Tensor 'default_policy_wk1/action_prob:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900) 'actions': <tf.Tensor 'default_policy_wk1/action:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=16900) 'dones': <tf.Tensor 'default_policy_wk1/dones:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900) 'new_obs': <tf.Tensor 'default_policy_wk1/new_obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=16900) 'obs': <tf.Tensor 'default_policy_wk1/obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=16900) 'q_values': <tf.Tensor 'default_policy_wk1/q_values:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=16900) 'rewards': <tf.Tensor 'default_policy_wk1/rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900) 'weights': <tf.Tensor 'default_policy_wk1/weights:0' shape=(?,) dtype=float32>}
(RolloutWorker pid=16900)
(RolloutWorker pid=16900) 2022-05-20 22:50:41,393 DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x000001B24812A400> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(RolloutWorker pid=16900) Stack (most recent call first):
(RolloutWorker pid=16900) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(RolloutWorker pid=16900) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(RolloutWorker pid=13860) 2022-05-20 22:50:41,408 DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x0000024525549400> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(RolloutWorker pid=13860) Stack (most recent call first):
(RolloutWorker pid=13860) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(RolloutWorker pid=13860) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(RolloutWorker pid=8672) 2022-05-20 22:50:41,408 DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x000001753E589400> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(RolloutWorker pid=8672) Stack (most recent call first):
(RolloutWorker pid=8672) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(RolloutWorker pid=8672) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(pid=) 2022-05-20 22:50:41,503 INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=17 --runtime-env-hash=2135802228
(pid=) 2022-05-20 22:50:41,534 INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=19 --runtime-env-hash=2135802228
(pid=) 2022-05-20 22:50:41,566 INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=18 --runtime-env-hash=2135802228
2022-05-20 22:51:03,817 WARNING worker.py:1382 -- The node with node id: 208e7e234a5d9af609995e90f0035f9db3b57f2130560403fe34704d and ip: 127.0.0.1 has been marked dead because the detector has missed too many heartbeats from it. This can happen when a raylet crashes unexpectedly or has lagging heartbeats.
== Status ==
Current time: 2022-05-20 22:51:03 (running for 00:00:56.88)
Memory usage on this node: 8.5/15.8 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 8.0/12 CPUs, 0/1 GPUs, 0.0/4.55 GiB heap, 0.0/2.28 GiB objects
Result logdir: D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06
Number of trials: 3/3 (1 PENDING, 2 RUNNING)
+------------------------------------+----------+-----------------+----------+-------------+
| Trial name | status | loc | gamma | lr |
|------------------------------------+----------+-----------------+----------+-------------|
| DQNTrainer_CartPole-v0_6e434_00000 | RUNNING | 127.0.0.1:19664 | 0.901065 | 0.000687763 |
| DQNTrainer_CartPole-v0_6e434_00001 | RUNNING | | 0.952011 | 0.000508342 |
| DQNTrainer_CartPole-v0_6e434_00002 | PENDING | | 0.922938 | 0.00096638 |
+------------------------------------+----------+-----------------+----------+-------------+
2022-05-20 22:51:03,824 WARNING resource_updater.py:51 -- Cluster resources not detected or are 0. Attempt #2...
(DQNTrainer pid=13672) Stack (most recent call first):
(DQNTrainer pid=13672) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\utils.py", line 116 in push_error_to_driver
(DQNTrainer pid=13672) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(DQNTrainer pid=13672) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(pid=) 2022-05-20 22:51:03,915 INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=16 --runtime-env-hash=2135802228
2022-05-20 22:51:04,338 WARNING resource_updater.py:51 -- Cluster resources not detected or are 0. Attempt #3...
2022-05-20 22:51:04,848 WARNING resource_updater.py:51 -- Cluster resources not detected or are 0. Attempt #4...
2022-05-20 22:51:05,351 WARNING resource_updater.py:51 -- Cluster resources not detected or are 0. Attempt #5...
2022-05-20 22:51:05,855 WARNING resource_updater.py:64 -- Cluster resources cannot be detected or are 0. You can resume this experiment by passing in `resume=True` to `run`.
2022-05-20 22:51:05,855 WARNING util.py:171 -- The `on_step_begin` operation took 2.033 s, which may be a performance bottleneck.
2022-05-20 22:51:05,855 INFO trial_runner.py:803 -- starting DQNTrainer_CartPole-v0_6e434_00002
Windows fatal exception: access violation
I could not reproduce this with latest ray HEAD. I did need to remove the "record_env":True,
parameter since it has been removed. Could you try again with a latest nightly
With the nightly version all 3 parallel tune runs start. The access violation does not occure but another unspecific error does. Actor died unexpected.
Edit: (mattip) put the error log into a <details>
block to hide it
@Peter-P779 did you change anything in the script or install instructions? Which nightly did you use?
I didn't change anything in the script except deleting "record_env":True
. I loaded the environment and executed following commands:
pip uninstall -y ray pip install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp39-cp39-win_amd64.whl
The version is the windows python 3.9 nightly: D:\ML\test_RLlib>ray --version ray, version 2.0.0.dev0
Hello all,
I can reproduce the crash on my Windows Desktop on both the current nightly and pypi release.
I stumbled over this Issue investigating an unexpected crash using only Ray Core, which exclusively occurs on my Home Desktop. On other systems (work notebook, high performance cluster, linux notebook) ray works like a charm. Exactly the same thing, afte ~90s of runtime the system crashes with the identical message, sometimes with an access violation error atop at the end.
TL;DR: I could not reproduce. If someone can still reproduce this, please report what you did using the comment below as a template, starting from a vanilla python installation.
And in too much detail:
Here is the script I used
Here is what I did
CPython39\python.exe -m venv d:\temp\issue24955
d:\temp\issue24955\Scripts\activate
>python -c "import sys; print(sys.version)"
3.9.6 (tags/v3.9.6:db3ff76, Jun 28 2021, 15:26:21) [MSC v.1929 64 bit (AMD64)]
>pip install "ray==2.1,0" "ray[rllib]==2.1.0" "ray[default]==2.1.0"
>pip install "ray[tune]==2.1.0" "gym==0.23.1" "tensorflow==2.10.0"
>pip install pygame gpuutils pywin32
REM copy script to d:\temp\issue24955.py
> python d:\temp\issue24955.py
I then get a number of diagonostic messages on startup with hints to improve the script
The script runs, and I can see the resource usage on the dashboard. There are 8 RolloutWorker actors and 2 DQN actors. The processes seem to take up to 14.3GB of RAM. The script runs for much more than 90 seconds: I stopped it after ~10 minutes by pressing CTRL-C, and it stopped cleanly:
2022-11-15 15:42:37,640 ERROR tune.py:773 -- Trials did not complete: [DQN_CartPole-v0_eaa22_00000, DQN_CartPole-v0_eaa22_00001, DQN_CartPole-v0_eaa22_00002]
2022-11-15 15:42:37,640 INFO tune.py:777 -- Total run time: 623.14 seconds (622.83 seconds for the tuning loop).
2022-11-15 15:42:37,640 WARNING tune.py:783 -- Experiment has been interrupted, but the most recent state was saved. You can continue running this experiment by passing `resume=True` to `tune.run()`
Hey, sorry for the somewhat unspecific response. It has been a while, but I remember after cross-examination of my working and non-working systems that the issue only occured with a specific Python 3.9 patch version. Switching to a previous patch resolved my problems completely.
Perhaps your machine has 16GB of RAM which is enough on linux but not sufficient on windows to run this experiment.
Closing this as we seem to lack a reproduction/may be related to python versioning.
What happened + What you expected to happen
Expectation: Training CartPole What Happens: WINDOWS FATAL EXECTION ACCESS VIOLATION
Versions / Dependencies
ray, version 1.12.0 Python 3.9.12 gym 0.21.0
Reproduction script
Issue Severity
High: It blocks me from completing my task.