ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
32.94k stars 5.58k forks source link

ray[RLlib]: Windows fatal exception: access violation #24955

Closed Peter-P779 closed 1 year ago

Peter-P779 commented 2 years ago

What happened + What you expected to happen

Expectation: Training CartPole What Happens: WINDOWS FATAL EXECTION ACCESS VIOLATION

D:\ML\test_RLlib\TF_Env\Scripts\python.exe D:/ML/test_RLlib/test/main.py
2022-05-19 10:49:33,916 INFO services.py:1456 -- View the Ray dashboard at http://127.0.0.1:8265
D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\tune\tune.py:455: UserWarning: Consider boosting PBT performance by enabling `reuse_actors` as well as implementing `reset_config` for Trainable.
  warnings.warn(
2022-05-19 10:49:36,775 WARNING trial_runner.py:1489 -- You are trying to access _search_alg interface of TrialRunner in TrialScheduler, which is being restricted. If you believe it is reasonable for your scheduler to access this TrialRunner API, please reach out to Ray team on GitHub. A more strict API access pattern would be enforced starting 1.12s.0
2022-05-19 10:49:36,900 INFO trial_runner.py:803 -- starting DQNTrainer_CartPole-v0_9caae_00000
(pid=1516) 
(DQNTrainer pid=7004) 2022-05-19 10:49:43,322   INFO trainer.py:2295 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
(DQNTrainer pid=7004) 2022-05-19 10:49:43,322   INFO simple_q.py:161 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting `simple_optimizer=True` if this doesn't work for you.
(pid=22604) 
(pid=10440) 
(pid=22456) 
(RolloutWorker pid=23604) Setting the path for recording to D:\ML\test_RLlib\test\results\DQNTrainer_2022-05-19_10-49-36\DQNTrainer_CartPole-v0_9caae_00000_0_2022-05-19_10-49-36\
(RolloutWorker pid=18268) Setting the path for recording to D:\ML\test_RLlib\test\results\DQNTrainer_2022-05-19_10-49-36\DQNTrainer_CartPole-v0_9caae_00000_0_2022-05-19_10-49-36\
(RolloutWorker pid=15504) Setting the path for recording to D:\ML\test_RLlib\test\results\DQNTrainer_2022-05-19_10-49-36\DQNTrainer_CartPole-v0_9caae_00000_0_2022-05-19_10-49-36\
(RolloutWorker pid=23604) 2022-05-19 10:49:49,852   WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=18268) 2022-05-19 10:49:49,864   WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=15504) 2022-05-19 10:49:49,846   WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=23604) 2022-05-19 10:49:49,938   DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=23604) 2022-05-19 10:49:49,938   DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x0000020B927EB100>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=23604) 2022-05-19 10:49:49,953   DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=18268) 2022-05-19 10:49:49,938   DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=18268) 2022-05-19 10:49:49,938   DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000002AB85A7A100>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=18268) 2022-05-19 10:49:49,953   DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=15504) 2022-05-19 10:49:49,938   DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=15504) 2022-05-19 10:49:49,938   DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000001A0229FA100>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=15504) 2022-05-19 10:49:49,953   DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=23604) 2022-05-19 10:49:50,623   INFO tf_policy.py:166 -- TFPolicy (worker=1) running on CPU.
(RolloutWorker pid=23604) 2022-05-19 10:49:50,692   INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=23604) 2022-05-19 10:49:50,693   INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=23604) 2022-05-19 10:49:50,693   INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=23604) 2022-05-19 10:49:50,694   INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=23604) 2022-05-19 10:49:50,694   INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=23604) 2022-05-19 10:49:50,695   DEBUG dynamic_tf_policy.py:752 -- Initializing loss function with dummy input:
(RolloutWorker pid=23604) 
(RolloutWorker pid=23604) { 'action_dist_inputs': <tf.Tensor 'default_policy_wk1/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=23604)   'action_logp': <tf.Tensor 'default_policy_wk1/action_logp:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'action_prob': <tf.Tensor 'default_policy_wk1/action_prob:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'actions': <tf.Tensor 'default_policy_wk1/action:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=23604)   'agent_index': <tf.Tensor 'default_policy_wk1/agent_index:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'dones': <tf.Tensor 'default_policy_wk1/dones:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'eps_id': <tf.Tensor 'default_policy_wk1/eps_id:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'new_obs': <tf.Tensor 'default_policy_wk1/new_obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=23604)   'obs': <tf.Tensor 'default_policy_wk1/obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=23604)   'prev_actions': <tf.Tensor 'default_policy_wk1/prev_actions:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=23604)   'prev_rewards': <tf.Tensor 'default_policy_wk1/prev_rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'q_values': <tf.Tensor 'default_policy_wk1/q_values:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=23604)   'rewards': <tf.Tensor 'default_policy_wk1/rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   't': <tf.Tensor 'default_policy_wk1/t:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'unroll_id': <tf.Tensor 'default_policy_wk1/unroll_id:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'weights': <tf.Tensor 'default_policy_wk1/weights:0' shape=(?,) dtype=float32>}
(RolloutWorker pid=23604) 
(RolloutWorker pid=18268) 2022-05-19 10:49:50,627   INFO tf_policy.py:166 -- TFPolicy (worker=3) running on CPU.
(RolloutWorker pid=18268) 2022-05-19 10:49:50,697   INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=18268) 2022-05-19 10:49:50,697   INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=18268) 2022-05-19 10:49:50,698   INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=18268) 2022-05-19 10:49:50,698   INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=18268) 2022-05-19 10:49:50,698   INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=15504) 2022-05-19 10:49:50,617   INFO tf_policy.py:166 -- TFPolicy (worker=2) running on CPU.
(RolloutWorker pid=15504) 2022-05-19 10:49:50,687   INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=15504) 2022-05-19 10:49:50,688   INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=15504) 2022-05-19 10:49:50,688   INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=15504) 2022-05-19 10:49:50,689   INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=15504) 2022-05-19 10:49:50,689   INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=23604) 2022-05-19 10:49:51,114   DEBUG tf_policy.py:742 -- These tensors were used in the loss functions:
(RolloutWorker pid=23604) { 'action_dist_inputs': <tf.Tensor 'default_policy_wk1/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=23604)   'action_logp': <tf.Tensor 'default_policy_wk1/action_logp:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'action_prob': <tf.Tensor 'default_policy_wk1/action_prob:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'actions': <tf.Tensor 'default_policy_wk1/action:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=23604)   'dones': <tf.Tensor 'default_policy_wk1/dones:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'new_obs': <tf.Tensor 'default_policy_wk1/new_obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=23604)   'obs': <tf.Tensor 'default_policy_wk1/obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=23604)   'q_values': <tf.Tensor 'default_policy_wk1/q_values:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=23604)   'rewards': <tf.Tensor 'default_policy_wk1/rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=23604)   'weights': <tf.Tensor 'default_policy_wk1/weights:0' shape=(?,) dtype=float32>}
(RolloutWorker pid=23604) 
(DQNTrainer pid=7004) 2022-05-19 10:49:51,371   INFO worker_set.py:154 -- Inferred observation/action spaces from remote worker (local worker has no env): {'default_policy': (Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), Discrete(2)), '__env__': (Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), Discrete(2))}
(RolloutWorker pid=23604) 2022-05-19 10:49:51,360   DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x0000020B9A35B340> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(RolloutWorker pid=18268) 2022-05-19 10:49:51,368   DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x000002AB95C6A340> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(RolloutWorker pid=15504) 2022-05-19 10:49:51,364   DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x000001A032B8B340> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(DQNTrainer pid=7004) 2022-05-19 10:49:51,437   DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(DQNTrainer pid=7004) 2022-05-19 10:49:51,437   DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000002747AA78130>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(DQNTrainer pid=7004) 2022-05-19 10:49:51,437   DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(DQNTrainer pid=7004) 2022-05-19 10:49:51,922   INFO tf_policy.py:166 -- TFPolicy (worker=local) running on CPU.
(DQNTrainer pid=7004) 2022-05-19 10:49:51,978   INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(DQNTrainer pid=7004) 2022-05-19 10:49:51,979   INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(DQNTrainer pid=7004) 2022-05-19 10:49:51,979   INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(DQNTrainer pid=7004) 2022-05-19 10:49:51,980   INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(DQNTrainer pid=7004) 2022-05-19 10:49:51,980   INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(DQNTrainer pid=7004) 2022-05-19 10:49:51,981   DEBUG dynamic_tf_policy.py:752 -- Initializing loss function with dummy input:
(DQNTrainer pid=7004) 
(DQNTrainer pid=7004) { 'action_dist_inputs': <tf.Tensor 'default_policy/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(DQNTrainer pid=7004)   'action_logp': <tf.Tensor 'default_policy/action_logp:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'action_prob': <tf.Tensor 'default_policy/action_prob:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'actions': <tf.Tensor 'default_policy/action:0' shape=(?,) dtype=int64>,
(DQNTrainer pid=7004)   'agent_index': <tf.Tensor 'default_policy/agent_index:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'dones': <tf.Tensor 'default_policy/dones:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'eps_id': <tf.Tensor 'default_policy/eps_id:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'new_obs': <tf.Tensor 'default_policy/new_obs:0' shape=(?, 4) dtype=float32>,
(DQNTrainer pid=7004)   'obs': <tf.Tensor 'default_policy/obs:0' shape=(?, 4) dtype=float32>,
(DQNTrainer pid=7004)   'prev_actions': <tf.Tensor 'default_policy/prev_actions:0' shape=(?,) dtype=int64>,
(DQNTrainer pid=7004)   'prev_rewards': <tf.Tensor 'default_policy/prev_rewards:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'q_values': <tf.Tensor 'default_policy/q_values:0' shape=(?, 2) dtype=float32>,
(DQNTrainer pid=7004)   'rewards': <tf.Tensor 'default_policy/rewards:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   't': <tf.Tensor 'default_policy/t:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'unroll_id': <tf.Tensor 'default_policy/unroll_id:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'weights': <tf.Tensor 'default_policy/weights:0' shape=(?,) dtype=float32>}
(DQNTrainer pid=7004) 
(DQNTrainer pid=7004) 2022-05-19 10:49:52,371   DEBUG tf_policy.py:742 -- These tensors were used in the loss functions:
(DQNTrainer pid=7004) { 'action_dist_inputs': <tf.Tensor 'default_policy/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(DQNTrainer pid=7004)   'action_logp': <tf.Tensor 'default_policy/action_logp:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'action_prob': <tf.Tensor 'default_policy/action_prob:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'actions': <tf.Tensor 'default_policy/action:0' shape=(?,) dtype=int64>,
(DQNTrainer pid=7004)   'dones': <tf.Tensor 'default_policy/dones:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'new_obs': <tf.Tensor 'default_policy/new_obs:0' shape=(?, 4) dtype=float32>,
(DQNTrainer pid=7004)   'obs': <tf.Tensor 'default_policy/obs:0' shape=(?, 4) dtype=float32>,
(DQNTrainer pid=7004)   'q_values': <tf.Tensor 'default_policy/q_values:0' shape=(?, 2) dtype=float32>,
(DQNTrainer pid=7004)   'rewards': <tf.Tensor 'default_policy/rewards:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=7004)   'weights': <tf.Tensor 'default_policy/weights:0' shape=(?,) dtype=float32>}
(DQNTrainer pid=7004) 
(DQNTrainer pid=7004) 2022-05-19 10:49:52,579   INFO rollout_worker.py:1727 -- Built policy map: {}
(DQNTrainer pid=7004) 2022-05-19 10:49:52,579   INFO rollout_worker.py:1728 -- Built preprocessor map: {'default_policy': <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000002747AA78130>}
(DQNTrainer pid=7004) 2022-05-19 10:49:52,580   INFO rollout_worker.py:666 -- Built filter map: {'default_policy': <ray.rllib.utils.filter.NoFilter object at 0x000002747C501FA0>}
(DQNTrainer pid=7004) 2022-05-19 10:49:52,580   DEBUG rollout_worker.py:779 -- Created rollout worker with env None (None), policies {}
== Status ==
Current time: 2022-05-19 10:49:52 (running for 00:00:15.84)
Memory usage on this node: 14.6/15.8 GiB: ***LOW MEMORY*** less than 10% of the memory on this node is available for use. This can cause unexpected crashes. Consider reducing the memory used by your application or reducing the Ray object store size by setting `object_store_memory` when calling `ray.init`.
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 4.0/12 CPUs, 0/1 GPUs, 0.0/2.27 GiB heap, 0.0/1.14 GiB objects
Result logdir: D:\ML\test_RLlib\test\results\DQNTrainer_2022-05-19_10-49-36
Number of trials: 3/3 (2 PENDING, 1 RUNNING)
+------------------------------------+----------+----------------+----------+-------------+
| Trial name                         | status   | loc            |    gamma |          lr |
|------------------------------------+----------+----------------+----------+-------------|
| DQNTrainer_CartPole-v0_9caae_00000 | RUNNING  | 127.0.0.1:7004 | 0.934952 | 0.000708551 |
| DQNTrainer_CartPole-v0_9caae_00001 | PENDING  |                | 0.976634 | 0.000561509 |
| DQNTrainer_CartPole-v0_9caae_00002 | PENDING  |                | 0.940114 | 0.000492675 |
+------------------------------------+----------+----------------+----------+-------------+

2022-05-19 10:49:52,620 INFO trial_runner.py:803 -- starting DQNTrainer_CartPole-v0_9caae_00001
(DQNTrainer pid=7004) 2022-05-19 10:49:52,605   WARNING util.py:60 -- Install gputil for GPU system monitoring.
(DQNTrainer pid=7004) 2022-05-19 10:49:52,659   WARNING trainer.py:1083 -- Worker crashed during call to `step_attempt()`. To try to continue training without the failed worker, set `ignore_worker_failures=True`.
(DQNTrainer pid=7004) 2022-05-19 10:49:52,664   ERROR worker.py:92 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::RolloutWorker.par_iter_next() (pid=18268, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000002AB859C89D0>)
(DQNTrainer pid=7004) ModuleNotFoundError: No module named 'pyglet'
(DQNTrainer pid=7004) 
(DQNTrainer pid=7004) During handling of the above exception, another exception occurred:
(DQNTrainer pid=7004) 
(DQNTrainer pid=7004) ray::RolloutWorker.par_iter_next() (pid=18268, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000002AB859C89D0>)
(DQNTrainer pid=7004)   File "python\ray\_raylet.pyx", line 656, in ray._raylet.execute_task
(DQNTrainer pid=7004)   File "python\ray\_raylet.pyx", line 697, in ray._raylet.execute_task
(DQNTrainer pid=7004)   File "python\ray\_raylet.pyx", line 663, in ray._raylet.execute_task
(DQNTrainer pid=7004)   File "python\ray\_raylet.pyx", line 667, in ray._raylet.execute_task
(DQNTrainer pid=7004)   File "python\ray\_raylet.pyx", line 614, in ray._raylet.execute_task.function_executor
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\function_manager.py", line 701, in actor_method_executor
(DQNTrainer pid=7004)     return method(__ray_actor, *args, **kwargs)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462, in _resume_span
(DQNTrainer pid=7004)     return method(self, *_args, **_kwargs)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 1186, in par_iter_next
(DQNTrainer pid=7004)     return next(self.local_it)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 404, in gen_rollouts
(DQNTrainer pid=7004)     yield self.sample()
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462, in _resume_span
(DQNTrainer pid=7004)     return method(self, *_args, **_kwargs)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 815, in sample
(DQNTrainer pid=7004)     batches = [self.input_reader.next()]
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\evaluation\sampler.py", line 116, in next
(DQNTrainer pid=7004)     batches = [self.get_data()]
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\evaluation\sampler.py", line 289, in get_data
(DQNTrainer pid=7004)     item = next(self._env_runner)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\evaluation\sampler.py", line 668, in _env_runner
(DQNTrainer pid=7004)     unfiltered_obs, rewards, dones, infos, off_policy_actions = base_env.poll()
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\env\vector_env.py", line 291, in poll
(DQNTrainer pid=7004)     self.new_obs = self.vector_env.vector_reset()
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\env\vector_env.py", line 227, in vector_reset
(DQNTrainer pid=7004)     return [e.reset() for e in self.envs]
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\env\vector_env.py", line 227, in <listcomp>
(DQNTrainer pid=7004)     return [e.reset() for e in self.envs]
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\wrappers\monitor.py", line 56, in reset
(DQNTrainer pid=7004)     self._after_reset(observation)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\wrappers\monitor.py", line 241, in _after_reset
(DQNTrainer pid=7004)     self.reset_video_recorder()
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\wrappers\monitor.py", line 267, in reset_video_recorder
(DQNTrainer pid=7004)     self.video_recorder.capture_frame()
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\wrappers\monitoring\video_recorder.py", line 132, in capture_frame
(DQNTrainer pid=7004)     frame = self.env.render(mode=render_mode)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\core.py", line 295, in render
(DQNTrainer pid=7004)     return self.env.render(mode, **kwargs)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\envs\classic_control\cartpole.py", line 179, in render
(DQNTrainer pid=7004)     from gym.envs.classic_control import rendering
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\envs\classic_control\rendering.py", line 17, in <module>
(DQNTrainer pid=7004)     raise ImportError(
(DQNTrainer pid=7004) ImportError: 
(DQNTrainer pid=7004)     Cannot import pyglet.
(DQNTrainer pid=7004)     HINT: you can install pyglet directly via 'pip install pyglet'.
(DQNTrainer pid=7004)     But if you really just want to install all Gym dependencies and not have to think about it,
(DQNTrainer pid=7004)     'pip install -e .[all]' or 'pip install gym[all]' will do it.
(DQNTrainer pid=7004) 2022-05-19 10:49:52,664   ERROR worker.py:92 -- Unhandled error (suppress with 'RAY_IGNORE_UNHANDLED_ERRORS=1'): ray::RolloutWorker.par_iter_next() (pid=15504, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000001A0229489D0>)
(DQNTrainer pid=7004) ModuleNotFoundError: No module named 'pyglet'
(DQNTrainer pid=7004) 
(DQNTrainer pid=7004) During handling of the above exception, another exception occurred:
(DQNTrainer pid=7004) 
(DQNTrainer pid=7004) ray::RolloutWorker.par_iter_next() (pid=15504, ip=127.0.0.1, repr=<ray.rllib.evaluation.rollout_worker.RolloutWorker object at 0x000001A0229489D0>)
(DQNTrainer pid=7004)   File "python\ray\_raylet.pyx", line 656, in ray._raylet.execute_task
(DQNTrainer pid=7004)   File "python\ray\_raylet.pyx", line 697, in ray._raylet.execute_task
(DQNTrainer pid=7004)   File "python\ray\_raylet.pyx", line 663, in ray._raylet.execute_task
(DQNTrainer pid=7004)   File "python\ray\_raylet.pyx", line 667, in ray._raylet.execute_task
(DQNTrainer pid=7004)   File "python\ray\_raylet.pyx", line 614, in ray._raylet.execute_task.function_executor
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\function_manager.py", line 701, in actor_method_executor
(DQNTrainer pid=7004)     return method(__ray_actor, *args, **kwargs)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462, in _resume_span
(DQNTrainer pid=7004)     return method(self, *_args, **_kwargs)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 1186, in par_iter_next
(DQNTrainer pid=7004)     return next(self.local_it)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 404, in gen_rollouts
(DQNTrainer pid=7004)     yield self.sample()
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462, in _resume_span
(DQNTrainer pid=7004)     return method(self, *_args, **_kwargs)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 815, in sample
(DQNTrainer pid=7004)     batches = [self.input_reader.next()]
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\evaluation\sampler.py", line 116, in next
(DQNTrainer pid=7004)     batches = [self.get_data()]
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\evaluation\sampler.py", line 289, in get_data
(DQNTrainer pid=7004)     item = next(self._env_runner)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\evaluation\sampler.py", line 668, in _env_runner
(DQNTrainer pid=7004)     unfiltered_obs, rewards, dones, infos, off_policy_actions = base_env.poll()
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\env\vector_env.py", line 291, in poll
(DQNTrainer pid=7004)     self.new_obs = self.vector_env.vector_reset()
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\env\vector_env.py", line 227, in vector_reset
(DQNTrainer pid=7004)     return [e.reset() for e in self.envs]
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\env\vector_env.py", line 227, in <listcomp>
(DQNTrainer pid=7004)     return [e.reset() for e in self.envs]
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\wrappers\monitor.py", line 56, in reset
(DQNTrainer pid=7004)     self._after_reset(observation)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\wrappers\monitor.py", line 241, in _after_reset
(DQNTrainer pid=7004)     self.reset_video_recorder()
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\wrappers\monitor.py", line 267, in reset_video_recorder
(DQNTrainer pid=7004)     self.video_recorder.capture_frame()
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\wrappers\monitoring\video_recorder.py", line 132, in capture_frame
(DQNTrainer pid=7004)     frame = self.env.render(mode=render_mode)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\core.py", line 295, in render
(RolloutWorker pid=23604) 2022-05-19 10:49:52,649   INFO rollout_worker.py:809 -- Generating sample batch of size 4
(RolloutWorker pid=23604) 2022-05-19 10:49:52,650   DEBUG sampler.py:609 -- No episode horizon specified, setting it to Env's limit (200).
(RolloutWorker pid=18268) 2022-05-19 10:49:52,650   DEBUG sampler.py:609 -- No episode horizon specified, setting it to Env's limit (200).
(RolloutWorker pid=15504) 2022-05-19 10:49:52,650   DEBUG sampler.py:609 -- No episode horizon specified, setting it to Env's limit (200).
(DQNTrainer pid=7004)     return self.env.render(mode, **kwargs)
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\envs\classic_control\cartpole.py", line 179, in render
(DQNTrainer pid=7004)     from gym.envs.classic_control import rendering
(DQNTrainer pid=7004)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\gym\envs\classic_control\rendering.py", line 17, in <module>
(DQNTrainer pid=7004)     raise ImportError(
(DQNTrainer pid=7004) ImportError: 
(DQNTrainer pid=7004)     Cannot import pyglet.
(DQNTrainer pid=7004)     HINT: you can install pyglet directly via 'pip install pyglet'.
(DQNTrainer pid=7004)     But if you really just want to install all Gym dependencies and not have to think about it,
(DQNTrainer pid=7004)     'pip install -e .[all]' or 'pip install gym[all]' will do it.
(pid=14788) 
(DQNTrainer pid=11332) 2022-05-19 10:49:57,976  INFO trainer.py:2295 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
(DQNTrainer pid=11332) 2022-05-19 10:49:57,976  INFO simple_q.py:161 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting `simple_optimizer=True` if this doesn't work for you.
(pid=11748) 
(pid=13188) 
(pid=16516) 
(pid=) [2022-05-19 10:50:04,680 E 9068 23852] (raylet.exe) agent_manager.cc:107: The raylet exited immediately because the Ray agent failed. The raylet fate shares with the agent. This can happen because the Ray agent was unexpectedly killed or failed. See `dashboard_agent.log` for the root cause.
(bundle_reservation_check_func pid=21264) 
(bundle_reservation_check_func pid=10124) 
(bundle_reservation_check_func pid=15300) 
(pid=2700) 
(RolloutWorker pid=23604) 
(RolloutWorker pid=18268) 
(RolloutWorker pid=15504) 
(DQNTrainer pid=7004) 
(RolloutWorker pid=13824) Stack (most recent call first):
(RolloutWorker pid=13824)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\utils.py", line 116 in push_error_to_driver
(RolloutWorker pid=13824)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(RolloutWorker pid=13824)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(RolloutWorker pid=600) Stack (most recent call first):
(RolloutWorker pid=600)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\utils.py", line 116 in push_error_to_driver
(RolloutWorker pid=600)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(RolloutWorker pid=600)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(RolloutWorker pid=12964) Stack (most recent call first):
(RolloutWorker pid=12964)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\utils.py", line 116 in push_error_to_driver
(RolloutWorker pid=12964)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(RolloutWorker pid=12964)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(pid=) 2022-05-19 10:50:06,397  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=58524 --object-store-name=tcp://127.0.0.1:64691 --raylet-name=tcp://127.0.0.1:63689 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=64921 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:61713 --redis-password=5241590000000000 --startup-token=19 --runtime-env-hash=213246870
(pid=) 2022-05-19 10:50:06,397  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=58524 --object-store-name=tcp://127.0.0.1:64691 --raylet-name=tcp://127.0.0.1:63689 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=64921 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:61713 --redis-password=5241590000000000 --startup-token=17 --runtime-env-hash=213246870
(pid=) 2022-05-19 10:50:06,412  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=58524 --object-store-name=tcp://127.0.0.1:64691 --raylet-name=tcp://127.0.0.1:63689 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=64921 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:61713 --redis-password=5241590000000000 --startup-token=18 --runtime-env-hash=213246870
(pid=) 2022-05-19 10:50:07,241  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=58524 --object-store-name=tcp://127.0.0.1:64691 --raylet-name=tcp://127.0.0.1:63689 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=64921 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:61713 --redis-password=5241590000000000 --startup-token=14 --runtime-env-hash=213246870
(pid=) 2022-05-19 10:50:07,272  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=58524 --object-store-name=tcp://127.0.0.1:64691 --raylet-name=tcp://127.0.0.1:63689 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=64921 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:61713 --redis-password=5241590000000000 --startup-token=15 --runtime-env-hash=213246870
(pid=) 2022-05-19 10:50:07,303  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=58524 --object-store-name=tcp://127.0.0.1:64691 --raylet-name=tcp://127.0.0.1:63689 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=64921 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:61713 --redis-password=5241590000000000 --startup-token=13 --runtime-env-hash=213246870
(pid=) 2022-05-19 10:50:07,881  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=58524 --object-store-name=tcp://127.0.0.1:64691 --raylet-name=tcp://127.0.0.1:63689 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=64921 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:61713 --redis-password=5241590000000000 --startup-token=12 --runtime-env-hash=213246870
2022-05-19 10:50:33,288 WARNING worker.py:1382 -- The node with node id: d581586b7c7e0633fb90264635b2f193775bd304c5a049fce7f81e2a and ip: 127.0.0.1 has been marked dead because the detector has missed too many heartbeats from it. This can happen when a raylet crashes unexpectedly or has lagging heartbeats.
2022-05-19 10:50:33,303 WARNING resource_updater.py:51 -- Cluster resources not detected or are 0. Attempt #2...
== Status ==
Current time: 2022-05-19 10:50:33 (running for 00:00:56.53)
Memory usage on this node: 12.3/15.8 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 8.0/12 CPUs, 0/1 GPUs, 0.0/2.27 GiB heap, 0.0/1.14 GiB objects
Result logdir: D:\ML\test_RLlib\test\results\DQNTrainer_2022-05-19_10-49-36
Number of trials: 3/3 (1 PENDING, 2 RUNNING)
+------------------------------------+----------+----------------+----------+-------------+
| Trial name                         | status   | loc            |    gamma |          lr |
|------------------------------------+----------+----------------+----------+-------------|
| DQNTrainer_CartPole-v0_9caae_00000 | RUNNING  | 127.0.0.1:7004 | 0.934952 | 0.000708551 |
| DQNTrainer_CartPole-v0_9caae_00001 | RUNNING  |                | 0.976634 | 0.000561509 |
| DQNTrainer_CartPole-v0_9caae_00002 | PENDING  |                | 0.940114 | 0.000492675 |
+------------------------------------+----------+----------------+----------+-------------+

(DQNTrainer pid=11332) Stack (most recent call first):
(DQNTrainer pid=11332)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\utils.py", line 116 in push_error_to_driver
(DQNTrainer pid=11332)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(DQNTrainer pid=11332)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(pid=) 2022-05-19 10:50:33,366  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=58524 --object-store-name=tcp://127.0.0.1:64691 --raylet-name=tcp://127.0.0.1:63689 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=64921 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:61713 --redis-password=5241590000000000 --startup-token=16 --runtime-env-hash=213246870
2022-05-19 10:50:33,803 WARNING resource_updater.py:51 -- Cluster resources not detected or are 0. Attempt #3...
2022-05-19 10:50:34,303 WARNING resource_updater.py:51 -- Cluster resources not detected or are 0. Attempt #4...
2022-05-19 10:50:34,819 WARNING resource_updater.py:51 -- Cluster resources not detected or are 0. Attempt #5...
2022-05-19 10:50:35,319 WARNING resource_updater.py:64 -- Cluster resources cannot be detected or are 0. You can resume this experiment by passing in `resume=True` to `run`.
2022-05-19 10:50:35,319 WARNING util.py:171 -- The `on_step_begin` operation took 2.016 s, which may be a performance bottleneck.
2022-05-19 10:50:35,319 INFO trial_runner.py:803 -- starting DQNTrainer_CartPole-v0_9caae_00002
Windows fatal exception: access violation

Process finished with exit code -1073741819 (0xC0000005)

Versions / Dependencies

ray, version 1.12.0 Python 3.9.12 gym 0.21.0

pip install ray
pip install "ray[rllib]" tensorflow torch
pip install ray[default]
pip install ray[tune]
pip install gym

Reproduction script

import ray
from ray import tune
from ray.rllib.agents.dqn import DQNTrainer
from ray.tune.schedulers import PopulationBasedTraining
import gym
import random

config = {
    "env":"CartPole-v0",
    "num_workers":3,
    "record_env":True,
    "num_gpus": 0,
    "framework":"tf",
    }

if __name__ == "__main__":

    pbt = PopulationBasedTraining(
        time_attr="time_total_s",
        perturbation_interval=7200,
        resample_probability=0.25,
        hyperparam_mutations={
            "lr": lambda: random.uniform(1e-3, 5e-5),
            "gamma": lambda: random.uniform(0.90, 0.99),
        },

    )
    import tensorflow as tf

    ray.init()

    tune.run(DQNTrainer, scheduler=pbt,
             config=config,
             num_samples=3,
             metric="episode_reward_mean",
             mode="max",
             local_dir="./results",
             sync_config=tune.SyncConfig(syncer=None),
             checkpoint_freq=500,
             keep_checkpoints_num=20)

    ray.shutdown()

Issue Severity

High: It blocks me from completing my task.

czgdp1807 commented 2 years ago

Let me look into this.

czgdp1807 commented 2 years ago

I ran this script and it keeps on running without any issues. It opens a bunch of Windows with some animations. See the attached screenshot.

My hardware information - 8 CPUs and 16 GB RAM on Azure Windows VM.

Screenshot 2022-05-20 at 1 12 56 PM
Peter-P779 commented 2 years ago

The error can't be reproduced on your machine then. At my machine the windows with the carts also open and then i get the error. Is there some log or so i can send for further analysis? The same programm runs perfect on wsl with ubuntu tho.

Laptop Dell G3 15:

Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz 2.59 GHz (12 Core) 16,0 GB RAM NVIDIA GeForce RTX 2060

Os: Windows 11 Home Version: 21H2

czgdp1807 commented 2 years ago

I see. Note that I don't have any GPU on my Azure Windows VM. Its Windows 10 Pro 20 H2.

Peter-P779 commented 2 years ago

So there might be a serious problem once the cluster gets updated?

gjoliver commented 2 years ago

A random input, the error message seems to say:

(DQNTrainer pid=7004) ModuleNotFoundError: No module named 'pyglet'

Peter-P779 commented 2 years ago

Yeah but that wasn't the reason for the error. I got the error on VizDoom. The cartpole thing is just a simpler setup for error report. Hence i didn't notice the missing package.

Here the updated console output after installing the package.

D:\ML\test_RLlib>call TF_Env/Scripts/activate
2022-05-20 22:50:02,463 INFO services.py:1456 -- View the Ray dashboard at http://127.0.0.1:8265
D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\tune\tune.py:455: UserWarning: Consider boosting PBT performance by enabling `reuse_actors` as well as implementing `reset_config` for Trainable.
  warnings.warn(
2022-05-20 22:50:06,949 WARNING trial_runner.py:1489 -- You are trying to access _search_alg interface of TrialRunner in TrialScheduler, which is being restricted. If you believe it is reasonable for your scheduler to access this TrialRunner API, please reach out to Ray team on GitHub. A more strict API access pattern would be enforced starting 1.12s.0
2022-05-20 22:50:07,066 INFO trial_runner.py:803 -- starting DQNTrainer_CartPole-v0_6e434_00000
(DQNTrainer pid=19664) 2022-05-20 22:50:13,392  INFO trainer.py:2295 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
(DQNTrainer pid=19664) 2022-05-20 22:50:13,393  INFO simple_q.py:161 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting `simple_optimizer=True` if this doesn't work for you.
(RolloutWorker pid=16968) 2022-05-20 22:50:19,353       WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=16968) Setting the path for recording to D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06\DQNTrainer_CartPole-v0_6e434_00000_0_2022-05-20_22-50-07\
(RolloutWorker pid=9892) 2022-05-20 22:50:19,416        WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=9892) Setting the path for recording to D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06\DQNTrainer_CartPole-v0_6e434_00000_0_2022-05-20_22-50-07\
(RolloutWorker pid=6604) 2022-05-20 22:50:19,420        WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=6604) Setting the path for recording to D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06\DQNTrainer_CartPole-v0_6e434_00000_0_2022-05-20_22-50-07\
(RolloutWorker pid=16968) 2022-05-20 22:50:21,101       DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=16968) 2022-05-20 22:50:21,103       DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x0000016C16DC81C0>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=16968) 2022-05-20 22:50:21,103       DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=9892) 2022-05-20 22:50:21,101        DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=9892) 2022-05-20 22:50:21,103        DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x0000010B135F81C0>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=9892) 2022-05-20 22:50:21,103        DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=6604) 2022-05-20 22:50:21,101        DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=6604) 2022-05-20 22:50:21,103        DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x0000019D8A1081C0>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=6604) 2022-05-20 22:50:21,103        DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=16968) 2022-05-20 22:50:21,696       INFO tf_policy.py:166 -- TFPolicy (worker=2) running on CPU.
(RolloutWorker pid=9892) 2022-05-20 22:50:21,704        INFO tf_policy.py:166 -- TFPolicy (worker=1) running on CPU.
(RolloutWorker pid=6604) 2022-05-20 22:50:21,704        INFO tf_policy.py:166 -- TFPolicy (worker=3) running on CPU.
(RolloutWorker pid=16968) 2022-05-20 22:50:21,772       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=16968) 2022-05-20 22:50:21,772       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=16968) 2022-05-20 22:50:21,773       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=16968) 2022-05-20 22:50:21,773       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=16968) 2022-05-20 22:50:21,773       INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=9892) 2022-05-20 22:50:21,777        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=9892) 2022-05-20 22:50:21,777        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=9892) 2022-05-20 22:50:21,778        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=9892) 2022-05-20 22:50:21,779        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=9892) 2022-05-20 22:50:21,779        INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=9892) 2022-05-20 22:50:21,780        DEBUG dynamic_tf_policy.py:752 -- Initializing loss function with dummy input:
(RolloutWorker pid=9892)
(RolloutWorker pid=9892) { 'action_dist_inputs': <tf.Tensor 'default_policy_wk1/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=9892)   'action_logp': <tf.Tensor 'default_policy_wk1/action_logp:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'action_prob': <tf.Tensor 'default_policy_wk1/action_prob:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'actions': <tf.Tensor 'default_policy_wk1/action:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=9892)   'agent_index': <tf.Tensor 'default_policy_wk1/agent_index:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'dones': <tf.Tensor 'default_policy_wk1/dones:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'eps_id': <tf.Tensor 'default_policy_wk1/eps_id:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'new_obs': <tf.Tensor 'default_policy_wk1/new_obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=9892)   'obs': <tf.Tensor 'default_policy_wk1/obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=9892)   'prev_actions': <tf.Tensor 'default_policy_wk1/prev_actions:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=9892)   'prev_rewards': <tf.Tensor 'default_policy_wk1/prev_rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'q_values': <tf.Tensor 'default_policy_wk1/q_values:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=9892)   'rewards': <tf.Tensor 'default_policy_wk1/rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   't': <tf.Tensor 'default_policy_wk1/t:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'unroll_id': <tf.Tensor 'default_policy_wk1/unroll_id:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'weights': <tf.Tensor 'default_policy_wk1/weights:0' shape=(?,) dtype=float32>}
(RolloutWorker pid=9892)
(RolloutWorker pid=6604) 2022-05-20 22:50:21,776        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=6604) 2022-05-20 22:50:21,777        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=6604) 2022-05-20 22:50:21,777        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=6604) 2022-05-20 22:50:21,778        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=6604) 2022-05-20 22:50:21,778        INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=9892) 2022-05-20 22:50:22,197        DEBUG tf_policy.py:742 -- These tensors were used in the loss functions:
(RolloutWorker pid=9892) { 'action_dist_inputs': <tf.Tensor 'default_policy_wk1/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=9892)   'action_logp': <tf.Tensor 'default_policy_wk1/action_logp:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'action_prob': <tf.Tensor 'default_policy_wk1/action_prob:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'actions': <tf.Tensor 'default_policy_wk1/action:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=9892)   'dones': <tf.Tensor 'default_policy_wk1/dones:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'new_obs': <tf.Tensor 'default_policy_wk1/new_obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=9892)   'obs': <tf.Tensor 'default_policy_wk1/obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=9892)   'q_values': <tf.Tensor 'default_policy_wk1/q_values:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=9892)   'rewards': <tf.Tensor 'default_policy_wk1/rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=9892)   'weights': <tf.Tensor 'default_policy_wk1/weights:0' shape=(?,) dtype=float32>}
(RolloutWorker pid=9892)
(DQNTrainer pid=19664) 2022-05-20 22:50:22,455  INFO worker_set.py:154 -- Inferred observation/action spaces from remote worker (local worker has no env): {'default_policy': (Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), Discrete(2)), '__env__': (Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), Discrete(2))}
(RolloutWorker pid=16968) 2022-05-20 22:50:22,440       DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x0000016C1E6D9400> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(RolloutWorker pid=9892) 2022-05-20 22:50:22,440        DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x0000010B1B1A9400> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(RolloutWorker pid=6604) 2022-05-20 22:50:22,424        DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x0000019D91B5A400> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(DQNTrainer pid=19664) 2022-05-20 22:50:22,518  DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(DQNTrainer pid=19664) 2022-05-20 22:50:22,518  DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000001D96D6951F0>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(DQNTrainer pid=19664) 2022-05-20 22:50:22,518  DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(DQNTrainer pid=19664) 2022-05-20 22:50:22,996  INFO tf_policy.py:166 -- TFPolicy (worker=local) running on CPU.
(DQNTrainer pid=19664) 2022-05-20 22:50:23,034  INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(DQNTrainer pid=19664) 2022-05-20 22:50:23,034  INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(DQNTrainer pid=19664) 2022-05-20 22:50:23,034  INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(DQNTrainer pid=19664) 2022-05-20 22:50:23,034  INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(DQNTrainer pid=19664) 2022-05-20 22:50:23,034  INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(DQNTrainer pid=19664) 2022-05-20 22:50:23,050  DEBUG dynamic_tf_policy.py:752 -- Initializing loss function with dummy input:
(DQNTrainer pid=19664)
(DQNTrainer pid=19664) { 'action_dist_inputs': <tf.Tensor 'default_policy/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(DQNTrainer pid=19664)   'action_logp': <tf.Tensor 'default_policy/action_logp:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'action_prob': <tf.Tensor 'default_policy/action_prob:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'actions': <tf.Tensor 'default_policy/action:0' shape=(?,) dtype=int64>,
(DQNTrainer pid=19664)   'agent_index': <tf.Tensor 'default_policy/agent_index:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'dones': <tf.Tensor 'default_policy/dones:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'eps_id': <tf.Tensor 'default_policy/eps_id:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'new_obs': <tf.Tensor 'default_policy/new_obs:0' shape=(?, 4) dtype=float32>,
(DQNTrainer pid=19664)   'obs': <tf.Tensor 'default_policy/obs:0' shape=(?, 4) dtype=float32>,
(DQNTrainer pid=19664)   'prev_actions': <tf.Tensor 'default_policy/prev_actions:0' shape=(?,) dtype=int64>,
(DQNTrainer pid=19664)   'prev_rewards': <tf.Tensor 'default_policy/prev_rewards:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'q_values': <tf.Tensor 'default_policy/q_values:0' shape=(?, 2) dtype=float32>,
(DQNTrainer pid=19664)   'rewards': <tf.Tensor 'default_policy/rewards:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   't': <tf.Tensor 'default_policy/t:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'unroll_id': <tf.Tensor 'default_policy/unroll_id:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'weights': <tf.Tensor 'default_policy/weights:0' shape=(?,) dtype=float32>}
(DQNTrainer pid=19664)
(DQNTrainer pid=19664) 2022-05-20 22:50:23,416  DEBUG tf_policy.py:742 -- These tensors were used in the loss functions:
(DQNTrainer pid=19664) { 'action_dist_inputs': <tf.Tensor 'default_policy/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(DQNTrainer pid=19664)   'action_logp': <tf.Tensor 'default_policy/action_logp:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'action_prob': <tf.Tensor 'default_policy/action_prob:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'actions': <tf.Tensor 'default_policy/action:0' shape=(?,) dtype=int64>,
(DQNTrainer pid=19664)   'dones': <tf.Tensor 'default_policy/dones:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'new_obs': <tf.Tensor 'default_policy/new_obs:0' shape=(?, 4) dtype=float32>,
(DQNTrainer pid=19664)   'obs': <tf.Tensor 'default_policy/obs:0' shape=(?, 4) dtype=float32>,
(DQNTrainer pid=19664)   'q_values': <tf.Tensor 'default_policy/q_values:0' shape=(?, 2) dtype=float32>,
(DQNTrainer pid=19664)   'rewards': <tf.Tensor 'default_policy/rewards:0' shape=(?,) dtype=float32>,
(DQNTrainer pid=19664)   'weights': <tf.Tensor 'default_policy/weights:0' shape=(?,) dtype=float32>}
(DQNTrainer pid=19664)
(DQNTrainer pid=19664) 2022-05-20 22:50:23,619  INFO rollout_worker.py:1727 -- Built policy map: {}
(DQNTrainer pid=19664) 2022-05-20 22:50:23,619  INFO rollout_worker.py:1728 -- Built preprocessor map: {'default_policy': <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000001D96D6951F0>}
(DQNTrainer pid=19664) 2022-05-20 22:50:23,619  INFO rollout_worker.py:666 -- Built filter map: {'default_policy': <ray.rllib.utils.filter.NoFilter object at 0x000001D974F91310>}
(DQNTrainer pid=19664) 2022-05-20 22:50:23,619  DEBUG rollout_worker.py:779 -- Created rollout worker with env None (None), policies {}
== Status ==
Current time: 2022-05-20 22:50:23 (running for 00:00:16.72)
Memory usage on this node: 10.6/15.8 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 4.0/12 CPUs, 0/1 GPUs, 0.0/4.55 GiB heap, 0.0/2.28 GiB objects
Result logdir: D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06
Number of trials: 3/3 (2 PENDING, 1 RUNNING)
+------------------------------------+----------+-----------------+----------+-------------+
| Trial name                         | status   | loc             |    gamma |          lr |
|------------------------------------+----------+-----------------+----------+-------------|
| DQNTrainer_CartPole-v0_6e434_00000 | RUNNING  | 127.0.0.1:19664 | 0.901065 | 0.000687763 |
| DQNTrainer_CartPole-v0_6e434_00001 | PENDING  |                 | 0.952011 | 0.000508342 |
| DQNTrainer_CartPole-v0_6e434_00002 | PENDING  |                 | 0.922938 | 0.00096638  |
+------------------------------------+----------+-----------------+----------+-------------+

2022-05-20 22:50:23,650 INFO trial_runner.py:803 -- starting DQNTrainer_CartPole-v0_6e434_00001
(DQNTrainer pid=19664) 2022-05-20 22:50:23,634  INFO trainable.py:152 -- Trainable.setup took 10.243 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
(DQNTrainer pid=19664) 2022-05-20 22:50:23,634  WARNING util.py:60 -- Install gputil for GPU system monitoring.
(RolloutWorker pid=16968) 2022-05-20 22:50:23,681       DEBUG sampler.py:609 -- No episode horizon specified, setting it to Env's limit (200).
(RolloutWorker pid=9892) 2022-05-20 22:50:23,681        INFO rollout_worker.py:809 -- Generating sample batch of size 4
(RolloutWorker pid=9892) 2022-05-20 22:50:23,681        DEBUG sampler.py:609 -- No episode horizon specified, setting it to Env's limit (200).
(RolloutWorker pid=6604) 2022-05-20 22:50:23,681        DEBUG sampler.py:609 -- No episode horizon specified, setting it to Env's limit (200).
(RolloutWorker pid=9892) 2022-05-20 22:50:24,892        INFO sampler.py:672 -- Raw obs from env: { 0: { 'agent0': np.ndarray((4,), dtype=float32, min=-0.039, max=-0.006, mean=-0.029)}}
(RolloutWorker pid=9892) 2022-05-20 22:50:24,892        INFO sampler.py:673 -- Info return from env: {0: {'agent0': None}}
(RolloutWorker pid=9892) 2022-05-20 22:50:24,892        INFO sampler.py:908 -- Preprocessed obs: np.ndarray((4,), dtype=float32, min=-0.039, max=-0.006, mean=-0.029)
(RolloutWorker pid=9892) 2022-05-20 22:50:24,893        INFO sampler.py:913 -- Filtered obs: np.ndarray((4,), dtype=float32, min=-0.039, max=-0.006, mean=-0.029)
(RolloutWorker pid=9892) 2022-05-20 22:50:24,894        INFO sampler.py:1143 -- Inputs to compute_actions():
(RolloutWorker pid=9892)
(RolloutWorker pid=9892) { 'default_policy': [ { 'data': { 'agent_id': 'agent0',
(RolloutWorker pid=9892)                                   'env_id': 0,
(RolloutWorker pid=9892)                                   'info': None,
(RolloutWorker pid=9892)                                   'obs': np.ndarray((4,), dtype=float32, min=-0.039, max=-0.006, mean=-0.029),
(RolloutWorker pid=9892)                                   'prev_action': None,
(RolloutWorker pid=9892)                                   'prev_reward': None,
(RolloutWorker pid=9892)                                   'rnn_state': None},
(RolloutWorker pid=9892)                         'type': 'PolicyEvalData'}]}
(RolloutWorker pid=9892)
(RolloutWorker pid=9892) 2022-05-20 22:50:24,895        INFO tf_run_builder.py:98 -- Executing TF run without tracing. To dump TF timeline traces to disk, set the TF_TIMELINE_DIR environment variable.
(RolloutWorker pid=9892) 2022-05-20 22:50:24,982        INFO sampler.py:1169 -- Outputs of compute_actions():
(RolloutWorker pid=9892)
(RolloutWorker pid=9892) { 'default_policy': ( np.ndarray((1,), dtype=int64, min=1.0, max=1.0, mean=1.0),
(RolloutWorker pid=9892)                       [],
(RolloutWorker pid=9892)                       { 'action_dist_inputs': np.ndarray((1, 2), dtype=float32, min=-0.038, max=0.044, mean=0.003),
(RolloutWorker pid=9892)                         'action_logp': np.ndarray((1,), dtype=float32, min=0.0, max=0.0, mean=0.0),
(RolloutWorker pid=9892)                         'action_prob': np.ndarray((1,), dtype=float32, min=1.0, max=1.0, mean=1.0),
(RolloutWorker pid=9892)                         'q_values': np.ndarray((1, 2), dtype=float32, min=-0.038, max=0.044, mean=0.003)})}
(RolloutWorker pid=9892)
(DQNTrainer pid=19664) 2022-05-20 22:50:25,352  INFO replay_buffer.py:47 -- Estimated max memory usage for replay buffer is 0.00305 GB (50000.0 batches of size 1, 61 bytes each), available system memory is 16.929984512 GB
(RolloutWorker pid=9892) 2022-05-20 22:50:25,340        INFO simple_list_collector.py:904 -- Trajectory fragment after postprocess_trajectory():
(RolloutWorker pid=9892)
(RolloutWorker pid=9892) { 'agent0': { 'actions': np.ndarray((4,), dtype=int64, min=0.0, max=1.0, mean=0.5),
(RolloutWorker pid=9892)               'agent_index': np.ndarray((4,), dtype=int32, min=0.0, max=0.0, mean=0.0),
(RolloutWorker pid=9892)               'dones': np.ndarray((4,), dtype=bool, min=0.0, max=0.0, mean=0.0),
(RolloutWorker pid=9892)               'eps_id': np.ndarray((4,), dtype=int32, min=1734707724.0, max=1734707724.0, mean=1734707724.0),
(RolloutWorker pid=9892)               'infos': np.ndarray((4,), dtype=object, head={}),
(RolloutWorker pid=9892)               'new_obs': np.ndarray((4, 4), dtype=float32, min=-0.615, max=0.353, mean=-0.063),
(RolloutWorker pid=9892)               'obs': np.ndarray((4, 4), dtype=float32, min=-0.615, max=0.353, mean=-0.059),
(RolloutWorker pid=9892)               'rewards': np.ndarray((4,), dtype=float32, min=1.0, max=1.0, mean=1.0),
(RolloutWorker pid=9892)               'unroll_id': np.ndarray((4,), dtype=int32, min=0.0, max=0.0, mean=0.0),
(RolloutWorker pid=9892)               'weights': np.ndarray((4,), dtype=float32, min=1.0, max=1.0, mean=1.0)}}
(RolloutWorker pid=9892)
(RolloutWorker pid=9892) 2022-05-20 22:50:25,341        INFO rollout_worker.py:854 -- Completed sample batch:
(RolloutWorker pid=9892)
(RolloutWorker pid=9892) { 'actions': np.ndarray((4,), dtype=int64, min=0.0, max=1.0, mean=0.5),
(RolloutWorker pid=9892)   'agent_index': np.ndarray((4,), dtype=int32, min=0.0, max=0.0, mean=0.0),
(RolloutWorker pid=9892)   'dones': np.ndarray((4,), dtype=bool, min=0.0, max=0.0, mean=0.0),
(RolloutWorker pid=9892)   'eps_id': np.ndarray((4,), dtype=int32, min=1734707724.0, max=1734707724.0, mean=1734707724.0),
(RolloutWorker pid=9892)   'new_obs': np.ndarray((4, 4), dtype=float32, min=-0.615, max=0.353, mean=-0.063),
(RolloutWorker pid=9892)   'obs': np.ndarray((4, 4), dtype=float32, min=-0.615, max=0.353, mean=-0.059),
(RolloutWorker pid=9892)   'rewards': np.ndarray((4,), dtype=float32, min=1.0, max=1.0, mean=1.0),
(RolloutWorker pid=9892)   'unroll_id': np.ndarray((4,), dtype=int32, min=0.0, max=0.0, mean=0.0),
(RolloutWorker pid=9892)   'weights': np.ndarray((4,), dtype=float32, min=1.0, max=1.0, mean=1.0)}
(RolloutWorker pid=9892)
(DQNTrainer pid=13672) 2022-05-20 22:50:31,174  INFO trainer.py:2295 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode.
(DQNTrainer pid=13672) 2022-05-20 22:50:31,174  INFO simple_q.py:161 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting `simple_optimizer=True` if this doesn't work for you.
(pid=) [2022-05-20 22:50:34,744 E 16452 19288] (raylet.exe) agent_manager.cc:107: The raylet exited immediately because the Ray agent failed. The raylet fate shares with the agent. This can happen because the Ray agent was unexpectedly killed or failed. See `dashboard_agent.log` for the root cause.
(DQNTrainer pid=19664) Stack (most recent call first):
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 362 in get_objects
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 1803 in get
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\client_mode_hook.py", line 105 in wrapper
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 492 in base_iterator
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 807 in apply_foreach
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 807 in apply_foreach
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 807 in apply_foreach
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 807 in apply_foreach
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 779 in __next__
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 1108 in build_union
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 869 in apply_filter
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 807 in apply_foreach
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 869 in apply_filter
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 869 in apply_filter
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 807 in apply_foreach
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\iter.py", line 779 in __next__
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\agents\trainer.py", line 2174 in _exec_plan_or_training_iteration_fn
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462 in _resume_span
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\agents\trainer.py", line 1155 in step_attempt
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462 in _resume_span
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\rllib\agents\trainer.py", line 1074 in step
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462 in _resume_span
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\tune\trainable.py", line 349 in train
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\util\tracing\tracing_helper.py", line 462 in _resume_span
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\function_manager.py", line 701 in actor_method_executor
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(DQNTrainer pid=19664)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(pid=) 2022-05-20 22:50:34,963  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=12 --runtime-env-hash=2135802228
(pid=) 2022-05-20 22:50:35,277  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=13 --runtime-env-hash=2135802228
(pid=) 2022-05-20 22:50:35,371  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=14 --runtime-env-hash=2135802228
(pid=) 2022-05-20 22:50:35,434  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=15 --runtime-env-hash=2135802228
(RolloutWorker pid=16900) 2022-05-20 22:50:39,099       WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=16900) Setting the path for recording to D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06\DQNTrainer_CartPole-v0_6e434_00001_1_2022-05-20_22-50-23\
(RolloutWorker pid=13860) 2022-05-20 22:50:39,108       WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=13860) Setting the path for recording to D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06\DQNTrainer_CartPole-v0_6e434_00001_1_2022-05-20_22-50-23\
(RolloutWorker pid=8672) 2022-05-20 22:50:39,072        WARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).
(RolloutWorker pid=8672) Setting the path for recording to D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06\DQNTrainer_CartPole-v0_6e434_00001_1_2022-05-20_22-50-23\
(RolloutWorker pid=16900) 2022-05-20 22:50:39,934       DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=16900) 2022-05-20 22:50:39,934       DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000001B2406E91C0>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=16900) 2022-05-20 22:50:39,942       DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=13860) 2022-05-20 22:50:39,934       DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=13860) 2022-05-20 22:50:39,934       DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000002451DAD81C0>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=13860) 2022-05-20 22:50:39,934       DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=8672) 2022-05-20 22:50:39,934        DEBUG rollout_worker.py:1704 -- Creating policy for default_policy
(RolloutWorker pid=8672) 2022-05-20 22:50:39,934        DEBUG catalog.py:805 -- Created preprocessor <ray.rllib.models.preprocessors.NoPreprocessor object at 0x000001752E4491C0>: Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32) -> (4,)
(RolloutWorker pid=8672) 2022-05-20 22:50:39,934        DEBUG worker_set.py:457 -- Creating TF session {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}
(RolloutWorker pid=16900) 2022-05-20 22:50:40,593       INFO tf_policy.py:166 -- TFPolicy (worker=1) running on CPU.
(RolloutWorker pid=16900) 2022-05-20 22:50:40,672       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=16900) 2022-05-20 22:50:40,672       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=16900) 2022-05-20 22:50:40,672       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=16900) 2022-05-20 22:50:40,672       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=16900) 2022-05-20 22:50:40,672       INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=16900) 2022-05-20 22:50:40,672       DEBUG dynamic_tf_policy.py:752 -- Initializing loss function with dummy input:
(RolloutWorker pid=16900)
(RolloutWorker pid=16900) { 'action_dist_inputs': <tf.Tensor 'default_policy_wk1/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=16900)   'action_logp': <tf.Tensor 'default_policy_wk1/action_logp:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'action_prob': <tf.Tensor 'default_policy_wk1/action_prob:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'actions': <tf.Tensor 'default_policy_wk1/action:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=16900)   'agent_index': <tf.Tensor 'default_policy_wk1/agent_index:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'dones': <tf.Tensor 'default_policy_wk1/dones:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'eps_id': <tf.Tensor 'default_policy_wk1/eps_id:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'new_obs': <tf.Tensor 'default_policy_wk1/new_obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=16900)   'obs': <tf.Tensor 'default_policy_wk1/obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=16900)   'prev_actions': <tf.Tensor 'default_policy_wk1/prev_actions:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=16900)   'prev_rewards': <tf.Tensor 'default_policy_wk1/prev_rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'q_values': <tf.Tensor 'default_policy_wk1/q_values:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=16900)   'rewards': <tf.Tensor 'default_policy_wk1/rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   't': <tf.Tensor 'default_policy_wk1/t:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'unroll_id': <tf.Tensor 'default_policy_wk1/unroll_id:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'weights': <tf.Tensor 'default_policy_wk1/weights:0' shape=(?,) dtype=float32>}
(RolloutWorker pid=16900)
(RolloutWorker pid=13860) 2022-05-20 22:50:40,593       INFO tf_policy.py:166 -- TFPolicy (worker=3) running on CPU.
(RolloutWorker pid=13860) 2022-05-20 22:50:40,672       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=13860) 2022-05-20 22:50:40,672       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=13860) 2022-05-20 22:50:40,672       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=13860) 2022-05-20 22:50:40,672       INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=13860) 2022-05-20 22:50:40,672       INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=8672) 2022-05-20 22:50:40,609        INFO tf_policy.py:166 -- TFPolicy (worker=2) running on CPU.
(RolloutWorker pid=8672) 2022-05-20 22:50:40,672        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `q_values` to view-reqs.
(RolloutWorker pid=8672) 2022-05-20 22:50:40,687        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_dist_inputs` to view-reqs.
(RolloutWorker pid=8672) 2022-05-20 22:50:40,687        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_logp` to view-reqs.
(RolloutWorker pid=8672) 2022-05-20 22:50:40,687        INFO dynamic_tf_policy.py:709 -- Adding extra-action-fetch `action_prob` to view-reqs.
(RolloutWorker pid=8672) 2022-05-20 22:50:40,687        INFO dynamic_tf_policy.py:718 -- Testing `postprocess_trajectory` w/ dummy batch.
(RolloutWorker pid=16900) 2022-05-20 22:50:41,144       DEBUG tf_policy.py:742 -- These tensors were used in the loss functions:
(RolloutWorker pid=16900) { 'action_dist_inputs': <tf.Tensor 'default_policy_wk1/action_dist_inputs:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=16900)   'action_logp': <tf.Tensor 'default_policy_wk1/action_logp:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'action_prob': <tf.Tensor 'default_policy_wk1/action_prob:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'actions': <tf.Tensor 'default_policy_wk1/action:0' shape=(?,) dtype=int64>,
(RolloutWorker pid=16900)   'dones': <tf.Tensor 'default_policy_wk1/dones:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'new_obs': <tf.Tensor 'default_policy_wk1/new_obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=16900)   'obs': <tf.Tensor 'default_policy_wk1/obs:0' shape=(?, 4) dtype=float32>,
(RolloutWorker pid=16900)   'q_values': <tf.Tensor 'default_policy_wk1/q_values:0' shape=(?, 2) dtype=float32>,
(RolloutWorker pid=16900)   'rewards': <tf.Tensor 'default_policy_wk1/rewards:0' shape=(?,) dtype=float32>,
(RolloutWorker pid=16900)   'weights': <tf.Tensor 'default_policy_wk1/weights:0' shape=(?,) dtype=float32>}
(RolloutWorker pid=16900)
(RolloutWorker pid=16900) 2022-05-20 22:50:41,393       DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x000001B24812A400> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(RolloutWorker pid=16900) Stack (most recent call first):
(RolloutWorker pid=16900)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(RolloutWorker pid=16900)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(RolloutWorker pid=13860) 2022-05-20 22:50:41,408       DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x0000024525549400> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(RolloutWorker pid=13860) Stack (most recent call first):
(RolloutWorker pid=13860)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(RolloutWorker pid=13860)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(RolloutWorker pid=8672) 2022-05-20 22:50:41,408        DEBUG rollout_worker.py:779 -- Created rollout worker with env <ray.rllib.env.vector_env.VectorEnvWrapper object at 0x000001753E589400> (<Monitor<TimeLimit<CartPoleEnv<CartPole-v0>>>>), policies {}
(RolloutWorker pid=8672) Stack (most recent call first):
(RolloutWorker pid=8672)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(RolloutWorker pid=8672)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(pid=) 2022-05-20 22:50:41,503  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=17 --runtime-env-hash=2135802228
(pid=) 2022-05-20 22:50:41,534  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=19 --runtime-env-hash=2135802228
(pid=) 2022-05-20 22:50:41,566  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=18 --runtime-env-hash=2135802228
2022-05-20 22:51:03,817 WARNING worker.py:1382 -- The node with node id: 208e7e234a5d9af609995e90f0035f9db3b57f2130560403fe34704d and ip: 127.0.0.1 has been marked dead because the detector has missed too many heartbeats from it. This can happen when a raylet crashes unexpectedly or has lagging heartbeats.
== Status ==
Current time: 2022-05-20 22:51:03 (running for 00:00:56.88)
Memory usage on this node: 8.5/15.8 GiB
PopulationBasedTraining: 0 checkpoints, 0 perturbs
Resources requested: 8.0/12 CPUs, 0/1 GPUs, 0.0/4.55 GiB heap, 0.0/2.28 GiB objects
Result logdir: D:\ML\test_RLlib\results\DQNTrainer_2022-05-20_22-50-06
Number of trials: 3/3 (1 PENDING, 2 RUNNING)
+------------------------------------+----------+-----------------+----------+-------------+
| Trial name                         | status   | loc             |    gamma |          lr |
|------------------------------------+----------+-----------------+----------+-------------|
| DQNTrainer_CartPole-v0_6e434_00000 | RUNNING  | 127.0.0.1:19664 | 0.901065 | 0.000687763 |
| DQNTrainer_CartPole-v0_6e434_00001 | RUNNING  |                 | 0.952011 | 0.000508342 |
| DQNTrainer_CartPole-v0_6e434_00002 | PENDING  |                 | 0.922938 | 0.00096638  |
+------------------------------------+----------+-----------------+----------+-------------+

2022-05-20 22:51:03,824 WARNING resource_updater.py:51 -- Cluster resources not detected or are 0. Attempt #2...
(DQNTrainer pid=13672) Stack (most recent call first):
(DQNTrainer pid=13672)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\utils.py", line 116 in push_error_to_driver
(DQNTrainer pid=13672)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 449 in main_loop
(DQNTrainer pid=13672)   File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 235 in <module>
(pid=) 2022-05-20 22:51:03,915  INFO context.py:67 -- Exec'ing worker with command: "D:\ML\test_RLlib\TF_Env\Scripts\python.exe" D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers/default_worker.py --node-ip-address=127.0.0.1 --node-manager-port=49340 --object-store-name=tcp://127.0.0.1:51555 --raylet-name=tcp://127.0.0.1:56782 --redis-address=None --storage=None --temp-dir=C:\Users\peter\AppData\Local\Temp\ray --metrics-agent-port=63069 --logging-rotate-bytes=536870912 --logging-rotate-backup-count=5 --gcs-address=127.0.0.1:58986 --redis-password=5241590000000000 --startup-token=16 --runtime-env-hash=2135802228
2022-05-20 22:51:04,338 WARNING resource_updater.py:51 -- Cluster resources not detected or are 0. Attempt #3...
2022-05-20 22:51:04,848 WARNING resource_updater.py:51 -- Cluster resources not detected or are 0. Attempt #4...
2022-05-20 22:51:05,351 WARNING resource_updater.py:51 -- Cluster resources not detected or are 0. Attempt #5...
2022-05-20 22:51:05,855 WARNING resource_updater.py:64 -- Cluster resources cannot be detected or are 0. You can resume this experiment by passing in `resume=True` to `run`.
2022-05-20 22:51:05,855 WARNING util.py:171 -- The `on_step_begin` operation took 2.033 s, which may be a performance bottleneck.
2022-05-20 22:51:05,855 INFO trial_runner.py:803 -- starting DQNTrainer_CartPole-v0_6e434_00002
Windows fatal exception: access violation
mattip commented 2 years ago

I could not reproduce this with latest ray HEAD. I did need to remove the "record_env":True, parameter since it has been removed. Could you try again with a latest nightly

Peter-P779 commented 2 years ago

With the nightly version all 3 parallel tune runs start. The access violation does not occure but another unspecific error does. Actor died unexpected.

console output ``` (DQNTrainer pid=12200) 2022-06-10 14:14:38,704 WARNING trainer.py:546 -- Worker crashed during call to `step_attempt()`. To try to continue training without failed worker(s), set `ignore_worker_failures=True`. To try to recover the failed worker(s), set `recreate_failed_workers=True`. (DQNTrainer pid=11888) Stack (most recent call first): (DQNTrainer pid=11888) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\utils.py", line 116 in push_error_to_driver (DQNTrainer pid=11888) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 451 in main_loop (DQNTrainer pid=11888) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 238 in (RolloutWorker pid=19976) Stack (most recent call first): (RolloutWorker pid=19976) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 451 in main_loop (RolloutWorker pid=19976) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 238 in (DQNTrainer pid=12200) Stack (most recent call first): (DQNTrainer pid=12200) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\utils.py", line 116 in push_error_to_driver (DQNTrainer pid=12200) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 451 in main_loop (DQNTrainer pid=12200) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 238 in 2022-06-10 14:14:40,376 ERROR trial_runner.py:886 -- Trial DQNTrainer_CartPole-v0_abcc0_00000: Error processing event. NoneType: None Result for DQNTrainer_CartPole-v0_abcc0_00000: agent_timesteps_total: 20160 counters: last_target_update_ts: 20160 num_agent_steps_sampled: 20160 num_agent_steps_trained: 51104 num_env_steps_sampled: 20160 num_env_steps_trained: 51104 num_target_updates: 39 custom_metrics: {} date: 2022-06-10_14-14-36 done: false episode_len_mean: 120.44 episode_media: {} episode_reward_max: 198.0 episode_reward_mean: 120.44 episode_reward_min: 17.0 episodes_this_iter: 6 episodes_total: 545 experiment_id: 6cc0969c65b54734be7464e33b9e7b11 experiment_tag: '0' hostname: DESKTOP-IH6PS6N info: last_target_update_ts: 20160 learner: default_policy: custom_metrics: {} learner_stats: cur_lr: 0.00015024440654087812 max_q: 20.960540771484375 mean_q: 17.269336700439453 mean_td_error: 1.2344892024993896 min_q: -1.7945809364318848 model: {} num_agent_steps_trained: 32.0 td_error: - 0.1963062286376953 - 13.694271087646484 - -0.1513042449951172 - -0.4782733917236328 - 17.277982711791992 - -0.45970726013183594 - -1.119187355041504 - 0.2422313690185547 - -0.3445110321044922 - -0.04111480712890625 - -0.02532196044921875 - -1.5336990356445312 - -0.1737194061279297 - -0.32820892333984375 - 0.20307254791259766 - -2.7945809364318848 - 0.06833648681640625 - 0.0260009765625 - 0.1797008514404297 - -0.18329429626464844 - -0.07255172729492188 - -1.2686529159545898 - 0.1986217498779297 - 0.15063190460205078 - 16.994535446166992 - -0.0013408660888671875 - 0.14139938354492188 - -0.039340972900390625 - -0.1665172576904297 - 0.0407257080078125 - -0.04569053649902344 - -0.6831436157226562 num_agent_steps_sampled: 20160 num_agent_steps_trained: 51104 num_env_steps_sampled: 20160 num_env_steps_trained: 51104 num_target_updates: 39 iterations_since_restore: 20 node_ip: 127.0.0.1 num_agent_steps_sampled: 20160 num_agent_steps_trained: 51104 num_env_steps_sampled: 20160 num_env_steps_sampled_this_iter: 1008 num_env_steps_trained: 51104 num_env_steps_trained_this_iter: 2688 num_healthy_workers: 3 off_policy_estimator: {} perf: cpu_util_percent: 70.95 ram_util_percent: 94.5 pid: 11888 policy_reward_max: {} policy_reward_mean: {} policy_reward_min: {} sampler_perf: mean_action_processing_ms: 0.10525665388206647 mean_env_render_ms: 0.0 mean_env_wait_ms: 0.07798196456066409 mean_inference_ms: 1.1943969157090608 mean_raw_obs_processing_ms: 0.22432358224892415 sampler_results: custom_metrics: {} episode_len_mean: 120.44 episode_media: {} episode_reward_max: 198.0 episode_reward_mean: 120.44 episode_reward_min: 17.0 episodes_this_iter: 6 hist_stats: episode_lengths: - 103 - 31 - 51 - 41 - 58 - 94 - 72 - 53 - 50 - 34 - 56 - 102 - 48 - 17 - 38 - 58 - 44 - 84 - 140 - 87 - 94 - 174 - 112 - 110 - 178 - 135 - 129 - 137 - 101 - 158 - 127 - 109 - 138 - 153 - 115 - 170 - 124 - 127 - 124 - 136 - 153 - 168 - 97 - 142 - 133 - 165 - 148 - 136 - 120 - 117 - 96 - 93 - 129 - 113 - 124 - 123 - 86 - 129 - 105 - 115 - 138 - 106 - 127 - 113 - 144 - 128 - 168 - 107 - 118 - 143 - 153 - 112 - 159 - 148 - 187 - 173 - 175 - 166 - 198 - 148 - 154 - 131 - 135 - 132 - 114 - 125 - 166 - 99 - 121 - 110 - 100 - 119 - 132 - 145 - 137 - 163 - 134 - 164 - 172 - 176 episode_reward: - 103.0 - 31.0 - 51.0 - 41.0 - 58.0 - 94.0 - 72.0 - 53.0 - 50.0 - 34.0 - 56.0 - 102.0 - 48.0 - 17.0 - 38.0 - 58.0 - 44.0 - 84.0 - 140.0 - 87.0 - 94.0 - 174.0 - 112.0 - 110.0 - 178.0 - 135.0 - 129.0 - 137.0 - 101.0 - 158.0 - 127.0 - 109.0 - 138.0 - 153.0 - 115.0 - 170.0 - 124.0 - 127.0 - 124.0 - 136.0 - 153.0 - 168.0 - 97.0 - 142.0 - 133.0 - 165.0 - 148.0 - 136.0 - 120.0 - 117.0 - 96.0 - 93.0 - 129.0 - 113.0 - 124.0 - 123.0 - 86.0 - 129.0 - 105.0 - 115.0 - 138.0 - 106.0 - 127.0 - 113.0 - 144.0 - 128.0 - 168.0 - 107.0 - 118.0 - 143.0 - 153.0 - 112.0 - 159.0 - 148.0 - 187.0 - 173.0 - 175.0 - 166.0 - 198.0 - 148.0 - 154.0 - 131.0 - 135.0 - 132.0 - 114.0 - 125.0 - 166.0 - 99.0 - 121.0 - 110.0 - 100.0 - 119.0 - 132.0 - 145.0 - 137.0 - 163.0 - 134.0 - 164.0 - 172.0 - 176.0 off_policy_estimator: {} policy_reward_max: {} policy_reward_mean: {} policy_reward_min: {} sampler_perf: mean_action_processing_ms: 0.10525665388206647 mean_env_render_ms: 0.0 mean_env_wait_ms: 0.07798196456066409 mean_inference_ms: 1.1943969157090608 mean_raw_obs_processing_ms: 0.22432358224892415 time_since_restore: 51.58517932891846 time_this_iter_s: 2.7187509536743164 time_total_s: 51.58517932891846 timers: learn_throughput: 20485.939 learn_time_ms: 1.562 load_throughput: 20482.188 load_time_ms: 1.562 synch_weights_time_ms: 0.0 training_iteration_time_ms: 31.25 timestamp: 1654863276 timesteps_since_restore: 0 timesteps_total: 20160 training_iteration: 20 trial_id: abcc0_00000 warmup_time: 9.709909439086914 == Status == Current time: 2022-06-10 14:14:40 (running for 00:01:39.80) Memory usage on this node: 10.6/15.8 GiB PopulationBasedTraining: 0 checkpoints, 0 perturbs Resources requested: 8.0/12 CPUs, 0/1 GPUs, 0.0/4.04 GiB heap, 0.0/2.02 GiB objects Current best trial: abcc0_00002 with episode_reward_mean=145.05 and parameters={'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, '_fake_gpus': False, 'custom_resources_per_worker': {}, 'placement_strategy': 'PACK', 'eager_tracing': False, 'eager_max_retraces': 20, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'env': 'CartPole-v0', 'env_config': {}, 'observation_space': None, 'action_space': None, 'env_task_fn': None, 'render_env': False, 'record_env': False, 'clip_rewards': None, 'normalize_actions': True, 'clip_actions': False, 'disable_env_checking': False, 'num_workers': 3, 'num_envs_per_worker': 1, 'sample_collector': , 'sample_async': False, 'rollout_fragment_length': 4, 'batch_mode': 'truncate_episodes', 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'ignore_worker_failures': False, 'recreate_failed_workers': False, 'horizon': None, 'soft_horizon': False, 'no_done_at_end': False, 'preprocessor_pref': 'deepmind', 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'compress_observations': False, 'gamma': 0.9579890222308893, 'lr': 0.0003559115980235469, 'train_batch_size': 32, 'model': {'_use_default_native_models': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, 'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'framestack': True, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1}, 'optimizer': {}, 'explore': True, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 10000}, 'input_config': {}, 'actions_in_input_normalized': False, 'input_evaluation': [, ], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_config': {}, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'evaluation_interval': None, 'evaluation_duration': 10, 'evaluation_duration_unit': 'episodes', 'evaluation_parallel_to_training': False, 'evaluation_config': {'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, '_fake_gpus': False, 'custom_resources_per_worker': {}, 'placement_strategy': 'PACK', 'eager_tracing': False, 'eager_max_retraces': 20, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'env': 'CartPole-v0', 'env_config': {}, 'observation_space': None, 'action_space': None, 'env_task_fn': None, 'render_env': False, 'record_env': False, 'clip_rewards': None, 'normalize_actions': True, 'clip_actions': False, 'disable_env_checking': False, 'num_workers': 3, 'num_envs_per_worker': 1, 'sample_collector': , 'sample_async': False, 'rollout_fragment_length': 4, 'batch_mode': 'truncate_episodes', 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'ignore_worker_failures': False, 'recreate_failed_workers': False, 'horizon': None, 'soft_horizon': False, 'no_done_at_end': False, 'preprocessor_pref': 'deepmind', 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'compress_observations': False, 'gamma': 0.9579890222308893, 'lr': 0.0003559115980235469, 'train_batch_size': 32, 'model': {'_use_default_native_models': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, 'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'framestack': True, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1}, 'optimizer': {}, 'explore': False, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 10000}, 'input_config': {}, 'actions_in_input_normalized': False, 'input_evaluation': [, ], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_config': {}, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'evaluation_interval': None, 'evaluation_duration': 10, 'evaluation_duration_unit': 'episodes', 'evaluation_parallel_to_training': False, 'evaluation_config': {'explore': False}, 'evaluation_num_workers': 0, 'always_attach_evaluation_results': False, 'in_evaluation': False, 'keep_per_episode_custom_metrics': False, 'metrics_episode_collection_timeout_s': 180, 'metrics_num_episodes_for_smoothing': 100, 'min_time_s_per_reporting': 1, 'min_train_timesteps_per_reporting': 0, 'min_sample_timesteps_per_reporting': 1000, 'logger_creator': None, 'logger_config': None, 'log_level': 'DEBUG', 'log_sys_usage': True, 'fake_sampler': False, 'seed': None, '_tf_policy_handles_more_than_one_loss': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, '_disable_execution_plan_api': True, 'simple_optimizer': False, 'monitor': -1, 'evaluation_num_episodes': -1, 'metrics_smoothing_episodes': -1, 'timesteps_per_iteration': -1, 'min_iter_time_s': -1, 'collect_metrics_timeout': -1, 'buffer_size': -1, 'prioritized_replay': -1, 'learning_starts': -1, 'replay_batch_size': -1, 'replay_sequence_length': None, 'prioritized_replay_alpha': -1, 'prioritized_replay_beta': -1, 'prioritized_replay_eps': -1, 'target_network_update_freq': 500, 'replay_buffer_config': {'type': , 'prioritized_replay': -1, 'capacity': 50000, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'prioritized_replay_eps': 1e-06, 'replay_sequence_length': 1, 'worker_side_prioritization': False, 'replay_mode': 'independent', 'replay_batch_size': 32}, 'store_buffer_in_checkpoints': False, 'lr_schedule': None, 'adam_epsilon': 1e-08, 'grad_clip': 40, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': True, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'before_learn_on_batch': None, 'training_intensity': None, 'input': 'sampler', 'multiagent': {'policies': {'default_policy': PolicySpec(policy_class=, observation_space=Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), action_space=Discrete(2), config={})}, 'policy_map_capacity': 100, 'policy_map_cache': None, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'callbacks': , 'create_env_on_driver': False, 'custom_eval_function': None, 'framework': 'tf', 'num_cpus_for_driver': 1}, 'evaluation_num_workers': 0, 'always_attach_evaluation_results': False, 'in_evaluation': False, 'keep_per_episode_custom_metrics': False, 'metrics_episode_collection_timeout_s': 180, 'metrics_num_episodes_for_smoothing': 100, 'min_time_s_per_reporting': 1, 'min_train_timesteps_per_reporting': 0, 'min_sample_timesteps_per_reporting': 1000, 'logger_creator': None, 'logger_config': None, 'log_level': 'DEBUG', 'log_sys_usage': True, 'fake_sampler': False, 'seed': None, '_tf_policy_handles_more_than_one_loss': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, '_disable_execution_plan_api': True, 'simple_optimizer': False, 'monitor': -1, 'evaluation_num_episodes': -1, 'metrics_smoothing_episodes': -1, 'timesteps_per_iteration': -1, 'min_iter_time_s': -1, 'collect_metrics_timeout': -1, 'buffer_size': -1, 'prioritized_replay': -1, 'learning_starts': -1, 'replay_batch_size': -1, 'replay_sequence_length': None, 'prioritized_replay_alpha': -1, 'prioritized_replay_beta': -1, 'prioritized_replay_eps': -1, 'target_network_update_freq': 500, 'replay_buffer_config': {'type': , 'prioritized_replay': -1, 'capacity': 50000, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'prioritized_replay_eps': 1e-06, 'replay_sequence_length': 1, 'worker_side_prioritization': False, 'replay_mode': 'independent', 'replay_batch_size': 32}, 'store_buffer_in_checkpoints': False, 'lr_schedule': None, 'adam_epsilon': 1e-08, 'grad_clip': 40, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': True, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'before_learn_on_batch': None, 'training_intensity': None, 'input': 'sampler', 'multiagent': {'policies': {'default_policy': PolicySpec(policy_class=, observation_space=Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), action_space=Discrete(2), config={})}, 'policy_map_capacity': 100, 'policy_map_cache': None, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'callbacks': , 'create_env_on_driver': False, 'custom_eval_function': None, 'framework': 'tf', 'num_cpus_for_driver': 1} Result logdir: D:\ML\test_RLlib\results\DQNTrainer_2022-06-10_14-13-00 Number of trials: 3/3 (1 ERROR, 2 RUNNING) +------------------------------------+----------+-----------------+----------+-------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ | Trial name | status | loc | gamma | lr | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean | |------------------------------------+----------+-----------------+----------+-------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------| | DQNTrainer_CartPole-v0_abcc0_00001 | RUNNING | 127.0.0.1:22368 | 0.921279 | 0.000754346 | 20 | 51.5196 | 20160 | 140.57 | 200 | 24 | 140.57 | | DQNTrainer_CartPole-v0_abcc0_00002 | RUNNING | 127.0.0.1:12200 | 0.957989 | 0.000355912 | 20 | 51.8039 | 20160 | 145.05 | 200 | 9 | 145.05 | | DQNTrainer_CartPole-v0_abcc0_00000 | ERROR | 127.0.0.1:11888 | 0.948385 | 0.000150244 | 20 | 51.5852 | 20160 | 120.44 | 198 | 17 | 120.44 | +------------------------------------+----------+-----------------+----------+-------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ Number of errored trials: 1 +------------------------------------+--------------+----------------------------------------------------------------------------------------------------------------------------+ | Trial name | # failures | error file | |------------------------------------+--------------+----------------------------------------------------------------------------------------------------------------------------| | DQNTrainer_CartPole-v0_abcc0_00000 | 1 | D:\ML\test_RLlib\results\DQNTrainer_2022-06-10_14-13-00\DQNTrainer_CartPole-v0_abcc0_00000_0_2022-06-10_14-13-00\error.txt | +------------------------------------+--------------+----------------------------------------------------------------------------------------------------------------------------+ 2022-06-10 14:14:40,391 ERROR trial_runner.py:886 -- Trial DQNTrainer_CartPole-v0_abcc0_00002: Error processing event. NoneType: None Result for DQNTrainer_CartPole-v0_abcc0_00002: agent_timesteps_total: 20160 counters: last_target_update_ts: 20160 num_agent_steps_sampled: 20160 num_agent_steps_trained: 51104 num_env_steps_sampled: 20160 num_env_steps_trained: 51104 num_target_updates: 39 custom_metrics: {} date: 2022-06-10_14-14-37 done: false episode_len_mean: 145.05 episode_media: {} episode_reward_max: 200.0 episode_reward_mean: 145.05 episode_reward_min: 9.0 episodes_this_iter: 5 episodes_total: 314 experiment_id: e41e93a18aeb40159aac256448355baf experiment_tag: '2' hostname: DESKTOP-IH6PS6N info: last_target_update_ts: 20160 learner: default_policy: custom_metrics: {} learner_stats: cur_lr: 0.0003559115866664797 max_q: 23.171512603759766 mean_q: 21.19548797607422 mean_td_error: 0.6148348450660706 min_q: 13.149581909179688 model: {} num_agent_steps_trained: 32.0 td_error: - -0.05719184875488281 - -1.0752925872802734 - 0.172882080078125 - 0.6037330627441406 - 0.6031627655029297 - 0.41985321044921875 - -0.424652099609375 - -0.17944717407226562 - 0.47750282287597656 - 14.227084159851074 - 0.08045196533203125 - 0.4776134490966797 - 0.36383819580078125 - 0.32493019104003906 - 0.5207614898681641 - -0.18519973754882812 - 0.48108482360839844 - 0.5255012512207031 - 0.18582534790039062 - -0.45144081115722656 - 0.4896736145019531 - -0.18197059631347656 - -0.5692501068115234 - 1.8208446502685547 - -1.270355224609375 - -0.2646903991699219 - 0.10660362243652344 - 0.5325069427490234 - 1.428915023803711 - 0.564300537109375 - -0.07600593566894531 - 0.003143310546875 num_agent_steps_sampled: 20160 num_agent_steps_trained: 51104 num_env_steps_sampled: 20160 num_env_steps_trained: 51104 num_target_updates: 39 iterations_since_restore: 20 node_ip: 127.0.0.1 num_agent_steps_sampled: 20160 num_agent_steps_trained: 51104 num_env_steps_sampled: 20160 num_env_steps_sampled_this_iter: 1008 num_env_steps_trained: 51104 num_env_steps_trained_this_iter: 2688 num_healthy_workers: 3 off_policy_estimator: {} perf: cpu_util_percent: 73.325 ram_util_percent: 94.55000000000001 pid: 12200 policy_reward_max: {} policy_reward_mean: {} policy_reward_min: {} sampler_perf: mean_action_processing_ms: 0.10668920705483345 mean_env_render_ms: 0.0 mean_env_wait_ms: 0.09492990508066096 mean_inference_ms: 1.2676988860450573 mean_raw_obs_processing_ms: 0.22440369556604942 sampler_results: custom_metrics: {} episode_len_mean: 145.05 episode_media: {} episode_reward_max: 200.0 episode_reward_mean: 145.05 episode_reward_min: 9.0 episodes_this_iter: 5 hist_stats: episode_lengths: - 32 - 9 - 15 - 200 - 55 - 72 - 120 - 42 - 22 - 16 - 32 - 64 - 53 - 106 - 24 - 102 - 183 - 160 - 27 - 200 - 145 - 200 - 200 - 200 - 81 - 200 - 200 - 200 - 200 - 200 - 200 - 200 - 199 - 147 - 179 - 200 - 200 - 146 - 200 - 150 - 200 - 200 - 200 - 191 - 200 - 200 - 200 - 171 - 200 - 200 - 200 - 162 - 177 - 105 - 71 - 47 - 101 - 90 - 49 - 87 - 100 - 54 - 85 - 108 - 153 - 132 - 179 - 176 - 138 - 144 - 148 - 152 - 155 - 145 - 140 - 152 - 183 - 182 - 118 - 139 - 136 - 138 - 135 - 139 - 139 - 200 - 184 - 183 - 123 - 136 - 200 - 200 - 200 - 200 - 177 - 200 - 200 - 200 - 200 - 200 episode_reward: - 32.0 - 9.0 - 15.0 - 200.0 - 55.0 - 72.0 - 120.0 - 42.0 - 22.0 - 16.0 - 32.0 - 64.0 - 53.0 - 106.0 - 24.0 - 102.0 - 183.0 - 160.0 - 27.0 - 200.0 - 145.0 - 200.0 - 200.0 - 200.0 - 81.0 - 200.0 - 200.0 - 200.0 - 200.0 - 200.0 - 200.0 - 200.0 - 199.0 - 147.0 - 179.0 - 200.0 - 200.0 - 146.0 - 200.0 - 150.0 - 200.0 - 200.0 - 200.0 - 191.0 - 200.0 - 200.0 - 200.0 - 171.0 - 200.0 - 200.0 - 200.0 - 162.0 - 177.0 - 105.0 - 71.0 - 47.0 - 101.0 - 90.0 - 49.0 - 87.0 - 100.0 - 54.0 - 85.0 - 108.0 - 153.0 - 132.0 - 179.0 - 176.0 - 138.0 - 144.0 - 148.0 - 152.0 - 155.0 - 145.0 - 140.0 - 152.0 - 183.0 - 182.0 - 118.0 - 139.0 - 136.0 - 138.0 - 135.0 - 139.0 - 139.0 - 200.0 - 184.0 - 183.0 - 123.0 - 136.0 - 200.0 - 200.0 - 200.0 - 200.0 - 177.0 - 200.0 - 200.0 - 200.0 - 200.0 - 200.0 off_policy_estimator: {} policy_reward_max: {} policy_reward_mean: {} policy_reward_min: {} sampler_perf: mean_action_processing_ms: 0.10668920705483345 mean_env_render_ms: 0.0 mean_env_wait_ms: 0.09492990508066096 mean_inference_ms: 1.2676988860450573 mean_raw_obs_processing_ms: 0.22440369556604942 time_since_restore: 51.803911447525024 time_this_iter_s: 2.546875476837158 time_total_s: 51.803911447525024 timers: learn_throughput: 0.0 learn_time_ms: 0.0 load_throughput: 0.0 load_time_ms: 0.0 synch_weights_time_ms: 6.25 training_iteration_time_ms: 31.25 timestamp: 1654863277 timesteps_since_restore: 0 timesteps_total: 20160 training_iteration: 20 trial_id: abcc0_00002 warmup_time: 8.938966274261475 (DQNTrainer pid=22368) 2022-06-10 14:14:40,797 WARNING trainer.py:546 -- Worker crashed during call to `step_attempt()`. To try to continue training without failed worker(s), set `ignore_worker_failures=True`. To try to recover the failed worker(s), set `recreate_failed_workers=True`. (DQNTrainer pid=22368) Stack (most recent call first): (DQNTrainer pid=22368) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\utils.py", line 116 in push_error_to_driver (DQNTrainer pid=22368) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 451 in main_loop (DQNTrainer pid=22368) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\workers\default_worker.py", line 238 in (pid=) [2022-06-10 14:14:40,799 E 9364 20076] (gcs_server.exe) gcs_server.cc:294: Failed to get the resource load: GrpcUnavailable: RPC Error message: failed to connect to all addresses; RPC Error details: (pid=) [2022-06-10 14:14:40,799 E 9364 20076] (gcs_server.exe) gcs_server.cc:294: Failed to get the resource load: GrpcUnavailable: RPC Error message: failed to connect to all addresses; RPC Error details: (pid=) [2022-06-10 14:14:40,923 E 9364 20076] (gcs_server.exe) gcs_server.cc:294: Failed to get the resource load: GrpcUnavailable: RPC Error message: failed to connect to all addresses; RPC Error details: (pid=) [2022-06-10 14:14:41,933 E 9364 20076] (gcs_server.exe) gcs_server.cc:294: Failed to get the resource load: GrpcUnavailable: RPC Error message: failed to connect to all addresses; RPC Error details: 2022-06-10 14:14:42,386 ERROR trial_runner.py:886 -- Trial DQNTrainer_CartPole-v0_abcc0_00001: Error processing event. NoneType: None Result for DQNTrainer_CartPole-v0_abcc0_00001: agent_timesteps_total: 20160 counters: last_target_update_ts: 20160 num_agent_steps_sampled: 20160 num_agent_steps_trained: 51104 num_env_steps_sampled: 20160 num_env_steps_trained: 51104 num_target_updates: 39 custom_metrics: {} date: 2022-06-10_14-14-36 done: false episode_len_mean: 140.57 episode_media: {} episode_reward_max: 200.0 episode_reward_mean: 140.57 episode_reward_min: 24.0 episodes_this_iter: 9 episodes_total: 331 experiment_id: 654971ccf0804e15806b20a8999dfda9 experiment_tag: '1' hostname: DESKTOP-IH6PS6N info: last_target_update_ts: 20160 learner: default_policy: custom_metrics: {} learner_stats: cur_lr: 0.0007543457322753966 max_q: 13.48817253112793 mean_q: 11.894736289978027 mean_td_error: 0.01804572343826294 min_q: 5.995281219482422 model: {} num_agent_steps_trained: 32.0 td_error: - -0.1672649383544922 - -0.21295738220214844 - 0.2727012634277344 - -0.26985740661621094 - -0.10014915466308594 - 0.18021106719970703 - 0.34715843200683594 - -0.5478029251098633 - 0.2668333053588867 - -0.15447235107421875 - -0.3227243423461914 - -0.45029163360595703 - -0.30243873596191406 - 0.016755104064941406 - 0.1908550262451172 - -0.5886564254760742 - -0.06766986846923828 - 0.34069156646728516 - 0.33641910552978516 - -0.7434234619140625 - -0.23517131805419922 - 0.060619354248046875 - -0.6940898895263672 - 4.995281219482422 - -0.7546262741088867 - -0.08367347717285156 - 0.02113056182861328 - 0.0528717041015625 - 0.2747077941894531 - -0.10827159881591797 - -0.8170671463012695 - -0.15816402435302734 num_agent_steps_sampled: 20160 num_agent_steps_trained: 51104 num_env_steps_sampled: 20160 num_env_steps_trained: 51104 num_target_updates: 39 iterations_since_restore: 20 node_ip: 127.0.0.1 num_agent_steps_sampled: 20160 num_agent_steps_trained: 51104 num_env_steps_sampled: 20160 num_env_steps_sampled_this_iter: 1008 num_env_steps_trained: 51104 num_env_steps_trained_this_iter: 2688 num_healthy_workers: 3 off_policy_estimator: {} perf: cpu_util_percent: 71.125 ram_util_percent: 94.5 pid: 22368 policy_reward_max: {} policy_reward_mean: {} policy_reward_min: {} sampler_perf: mean_action_processing_ms: 0.12795337718083277 mean_env_render_ms: 0.0 mean_env_wait_ms: 0.08253337092390917 mean_inference_ms: 1.1938984049080479 mean_raw_obs_processing_ms: 0.21474479833075882 sampler_results: custom_metrics: {} episode_len_mean: 140.57 episode_media: {} episode_reward_max: 200.0 episode_reward_mean: 140.57 episode_reward_min: 24.0 episodes_this_iter: 9 hist_stats: episode_lengths: - 31 - 35 - 51 - 90 - 53 - 170 - 28 - 27 - 92 - 63 - 156 - 79 - 60 - 27 - 99 - 24 - 158 - 97 - 91 - 115 - 200 - 200 - 65 - 200 - 137 - 165 - 200 - 155 - 111 - 153 - 77 - 129 - 180 - 117 - 128 - 130 - 87 - 101 - 121 - 200 - 153 - 105 - 147 - 158 - 189 - 132 - 127 - 139 - 112 - 123 - 189 - 163 - 200 - 169 - 200 - 178 - 200 - 200 - 178 - 200 - 178 - 200 - 112 - 200 - 168 - 167 - 143 - 100 - 200 - 193 - 200 - 200 - 186 - 164 - 177 - 200 - 159 - 195 - 200 - 172 - 200 - 200 - 143 - 200 - 154 - 200 - 135 - 138 - 135 - 156 - 139 - 97 - 125 - 121 - 200 - 108 - 113 - 125 - 83 - 107 episode_reward: - 31.0 - 35.0 - 51.0 - 90.0 - 53.0 - 170.0 - 28.0 - 27.0 - 92.0 - 63.0 - 156.0 - 79.0 - 60.0 - 27.0 - 99.0 - 24.0 - 158.0 - 97.0 - 91.0 - 115.0 - 200.0 - 200.0 - 65.0 - 200.0 - 137.0 - 165.0 - 200.0 - 155.0 - 111.0 - 153.0 - 77.0 - 129.0 - 180.0 - 117.0 - 128.0 - 130.0 - 87.0 - 101.0 - 121.0 - 200.0 - 153.0 - 105.0 - 147.0 - 158.0 - 189.0 - 132.0 - 127.0 - 139.0 - 112.0 - 123.0 - 189.0 - 163.0 - 200.0 - 169.0 - 200.0 - 178.0 - 200.0 - 200.0 - 178.0 - 200.0 - 178.0 - 200.0 - 112.0 - 200.0 - 168.0 - 167.0 - 143.0 - 100.0 - 200.0 - 193.0 - 200.0 - 200.0 - 186.0 - 164.0 - 177.0 - 200.0 - 159.0 - 195.0 - 200.0 - 172.0 - 200.0 - 200.0 - 143.0 - 200.0 - 154.0 - 200.0 - 135.0 - 138.0 - 135.0 - 156.0 - 139.0 - 97.0 - 125.0 - 121.0 - 200.0 - 108.0 - 113.0 - 125.0 - 83.0 - 107.0 off_policy_estimator: {} policy_reward_max: {} policy_reward_mean: {} policy_reward_min: {} sampler_perf: mean_action_processing_ms: 0.12795337718083277 mean_env_render_ms: 0.0 mean_env_wait_ms: 0.08253337092390917 mean_inference_ms: 1.1938984049080479 mean_raw_obs_processing_ms: 0.21474479833075882 time_since_restore: 51.519564390182495 time_this_iter_s: 2.7031259536743164 time_total_s: 51.519564390182495 timers: learn_throughput: 10240.078 learn_time_ms: 3.125 load_throughput: 0.0 load_time_ms: 0.0 synch_weights_time_ms: 4.688 training_iteration_time_ms: 29.687 timestamp: 1654863276 timesteps_since_restore: 0 timesteps_total: 20160 training_iteration: 20 trial_id: abcc0_00001 warmup_time: 8.743526935577393 == Status == Current time: 2022-06-10 14:14:42 (running for 00:01:41.87) Memory usage on this node: 9.1/15.8 GiB PopulationBasedTraining: 0 checkpoints, 0 perturbs Resources requested: 0/12 CPUs, 0/1 GPUs, 0.0/4.04 GiB heap, 0.0/2.02 GiB objects Current best trial: abcc0_00002 with episode_reward_mean=145.05 and parameters={'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, '_fake_gpus': False, 'custom_resources_per_worker': {}, 'placement_strategy': 'PACK', 'eager_tracing': False, 'eager_max_retraces': 20, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'env': 'CartPole-v0', 'env_config': {}, 'observation_space': None, 'action_space': None, 'env_task_fn': None, 'render_env': False, 'record_env': False, 'clip_rewards': None, 'normalize_actions': True, 'clip_actions': False, 'disable_env_checking': False, 'num_workers': 3, 'num_envs_per_worker': 1, 'sample_collector': , 'sample_async': False, 'rollout_fragment_length': 4, 'batch_mode': 'truncate_episodes', 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'ignore_worker_failures': False, 'recreate_failed_workers': False, 'horizon': None, 'soft_horizon': False, 'no_done_at_end': False, 'preprocessor_pref': 'deepmind', 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'compress_observations': False, 'gamma': 0.9579890222308893, 'lr': 0.0003559115980235469, 'train_batch_size': 32, 'model': {'_use_default_native_models': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, 'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'framestack': True, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1}, 'optimizer': {}, 'explore': True, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 10000}, 'input_config': {}, 'actions_in_input_normalized': False, 'input_evaluation': [, ], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_config': {}, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'evaluation_interval': None, 'evaluation_duration': 10, 'evaluation_duration_unit': 'episodes', 'evaluation_parallel_to_training': False, 'evaluation_config': {'extra_python_environs_for_driver': {}, 'extra_python_environs_for_worker': {}, 'num_gpus': 0, 'num_cpus_per_worker': 1, 'num_gpus_per_worker': 0, '_fake_gpus': False, 'custom_resources_per_worker': {}, 'placement_strategy': 'PACK', 'eager_tracing': False, 'eager_max_retraces': 20, 'tf_session_args': {'intra_op_parallelism_threads': 2, 'inter_op_parallelism_threads': 2, 'gpu_options': {'allow_growth': True}, 'log_device_placement': False, 'device_count': {'CPU': 1}, 'allow_soft_placement': True}, 'local_tf_session_args': {'intra_op_parallelism_threads': 8, 'inter_op_parallelism_threads': 8}, 'env': 'CartPole-v0', 'env_config': {}, 'observation_space': None, 'action_space': None, 'env_task_fn': None, 'render_env': False, 'record_env': False, 'clip_rewards': None, 'normalize_actions': True, 'clip_actions': False, 'disable_env_checking': False, 'num_workers': 3, 'num_envs_per_worker': 1, 'sample_collector': , 'sample_async': False, 'rollout_fragment_length': 4, 'batch_mode': 'truncate_episodes', 'remote_worker_envs': False, 'remote_env_batch_wait_ms': 0, 'ignore_worker_failures': False, 'recreate_failed_workers': False, 'horizon': None, 'soft_horizon': False, 'no_done_at_end': False, 'preprocessor_pref': 'deepmind', 'observation_filter': 'NoFilter', 'synchronize_filters': True, 'compress_observations': False, 'gamma': 0.9579890222308893, 'lr': 0.0003559115980235469, 'train_batch_size': 32, 'model': {'_use_default_native_models': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, 'fcnet_hiddens': [256, 256], 'fcnet_activation': 'tanh', 'conv_filters': None, 'conv_activation': 'relu', 'post_fcnet_hiddens': [], 'post_fcnet_activation': 'relu', 'free_log_std': False, 'no_final_linear': False, 'vf_share_layers': True, 'use_lstm': False, 'max_seq_len': 20, 'lstm_cell_size': 256, 'lstm_use_prev_action': False, 'lstm_use_prev_reward': False, '_time_major': False, 'use_attention': False, 'attention_num_transformer_units': 1, 'attention_dim': 64, 'attention_num_heads': 1, 'attention_head_dim': 32, 'attention_memory_inference': 50, 'attention_memory_training': 50, 'attention_position_wise_mlp_dim': 32, 'attention_init_gru_gate_bias': 2.0, 'attention_use_n_prev_actions': 0, 'attention_use_n_prev_rewards': 0, 'framestack': True, 'dim': 84, 'grayscale': False, 'zero_mean': True, 'custom_model': None, 'custom_model_config': {}, 'custom_action_dist': None, 'custom_preprocessor': None, 'lstm_use_prev_action_reward': -1}, 'optimizer': {}, 'explore': False, 'exploration_config': {'type': 'EpsilonGreedy', 'initial_epsilon': 1.0, 'final_epsilon': 0.02, 'epsilon_timesteps': 10000}, 'input_config': {}, 'actions_in_input_normalized': False, 'input_evaluation': [, ], 'postprocess_inputs': False, 'shuffle_buffer_size': 0, 'output': None, 'output_config': {}, 'output_compress_columns': ['obs', 'new_obs'], 'output_max_file_size': 67108864, 'evaluation_interval': None, 'evaluation_duration': 10, 'evaluation_duration_unit': 'episodes', 'evaluation_parallel_to_training': False, 'evaluation_config': {'explore': False}, 'evaluation_num_workers': 0, 'always_attach_evaluation_results': False, 'in_evaluation': False, 'keep_per_episode_custom_metrics': False, 'metrics_episode_collection_timeout_s': 180, 'metrics_num_episodes_for_smoothing': 100, 'min_time_s_per_reporting': 1, 'min_train_timesteps_per_reporting': 0, 'min_sample_timesteps_per_reporting': 1000, 'logger_creator': None, 'logger_config': None, 'log_level': 'DEBUG', 'log_sys_usage': True, 'fake_sampler': False, 'seed': None, '_tf_policy_handles_more_than_one_loss': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, '_disable_execution_plan_api': True, 'simple_optimizer': False, 'monitor': -1, 'evaluation_num_episodes': -1, 'metrics_smoothing_episodes': -1, 'timesteps_per_iteration': -1, 'min_iter_time_s': -1, 'collect_metrics_timeout': -1, 'buffer_size': -1, 'prioritized_replay': -1, 'learning_starts': -1, 'replay_batch_size': -1, 'replay_sequence_length': None, 'prioritized_replay_alpha': -1, 'prioritized_replay_beta': -1, 'prioritized_replay_eps': -1, 'target_network_update_freq': 500, 'replay_buffer_config': {'type': , 'prioritized_replay': -1, 'capacity': 50000, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'prioritized_replay_eps': 1e-06, 'replay_sequence_length': 1, 'worker_side_prioritization': False, 'replay_mode': 'independent', 'replay_batch_size': 32}, 'store_buffer_in_checkpoints': False, 'lr_schedule': None, 'adam_epsilon': 1e-08, 'grad_clip': 40, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': True, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'before_learn_on_batch': None, 'training_intensity': None, 'input': 'sampler', 'multiagent': {'policies': {'default_policy': PolicySpec(policy_class=, observation_space=Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), action_space=Discrete(2), config={})}, 'policy_map_capacity': 100, 'policy_map_cache': None, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'callbacks': , 'create_env_on_driver': False, 'custom_eval_function': None, 'framework': 'tf', 'num_cpus_for_driver': 1}, 'evaluation_num_workers': 0, 'always_attach_evaluation_results': False, 'in_evaluation': False, 'keep_per_episode_custom_metrics': False, 'metrics_episode_collection_timeout_s': 180, 'metrics_num_episodes_for_smoothing': 100, 'min_time_s_per_reporting': 1, 'min_train_timesteps_per_reporting': 0, 'min_sample_timesteps_per_reporting': 1000, 'logger_creator': None, 'logger_config': None, 'log_level': 'DEBUG', 'log_sys_usage': True, 'fake_sampler': False, 'seed': None, '_tf_policy_handles_more_than_one_loss': False, '_disable_preprocessor_api': False, '_disable_action_flattening': False, '_disable_execution_plan_api': True, 'simple_optimizer': False, 'monitor': -1, 'evaluation_num_episodes': -1, 'metrics_smoothing_episodes': -1, 'timesteps_per_iteration': -1, 'min_iter_time_s': -1, 'collect_metrics_timeout': -1, 'buffer_size': -1, 'prioritized_replay': -1, 'learning_starts': -1, 'replay_batch_size': -1, 'replay_sequence_length': None, 'prioritized_replay_alpha': -1, 'prioritized_replay_beta': -1, 'prioritized_replay_eps': -1, 'target_network_update_freq': 500, 'replay_buffer_config': {'type': , 'prioritized_replay': -1, 'capacity': 50000, 'prioritized_replay_alpha': 0.6, 'prioritized_replay_beta': 0.4, 'prioritized_replay_eps': 1e-06, 'replay_sequence_length': 1, 'worker_side_prioritization': False, 'replay_mode': 'independent', 'replay_batch_size': 32}, 'store_buffer_in_checkpoints': False, 'lr_schedule': None, 'adam_epsilon': 1e-08, 'grad_clip': 40, 'num_atoms': 1, 'v_min': -10.0, 'v_max': 10.0, 'noisy': False, 'sigma0': 0.5, 'dueling': True, 'hiddens': [256], 'double_q': True, 'n_step': 1, 'before_learn_on_batch': None, 'training_intensity': None, 'input': 'sampler', 'multiagent': {'policies': {'default_policy': PolicySpec(policy_class=, observation_space=Box([-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38], [4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38], (4,), float32), action_space=Discrete(2), config={})}, 'policy_map_capacity': 100, 'policy_map_cache': None, 'policy_mapping_fn': None, 'policies_to_train': None, 'observation_fn': None, 'replay_mode': 'independent', 'count_steps_by': 'env_steps'}, 'callbacks': , 'create_env_on_driver': False, 'custom_eval_function': None, 'framework': 'tf', 'num_cpus_for_driver': 1} Result logdir: D:\ML\test_RLlib\results\DQNTrainer_2022-06-10_14-13-00 Number of trials: 3/3 (3 ERROR) +------------------------------------+----------+-----------------+----------+-------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ | Trial name | status | loc | gamma | lr | iter | total time (s) | ts | reward | episode_reward_max | episode_reward_min | episode_len_mean | |------------------------------------+----------+-----------------+----------+-------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------| | DQNTrainer_CartPole-v0_abcc0_00000 | ERROR | 127.0.0.1:11888 | 0.948385 | 0.000150244 | 20 | 51.5852 | 20160 | 120.44 | 198 | 17 | 120.44 | | DQNTrainer_CartPole-v0_abcc0_00001 | ERROR | 127.0.0.1:22368 | 0.921279 | 0.000754346 | 20 | 51.5196 | 20160 | 140.57 | 200 | 24 | 140.57 | | DQNTrainer_CartPole-v0_abcc0_00002 | ERROR | 127.0.0.1:12200 | 0.957989 | 0.000355912 | 20 | 51.8039 | 20160 | 145.05 | 200 | 9 | 145.05 | +------------------------------------+----------+-----------------+----------+-------------+--------+------------------+-------+----------+----------------------+----------------------+--------------------+ Number of errored trials: 3 +------------------------------------+--------------+----------------------------------------------------------------------------------------------------------------------------+ | Trial name | # failures | error file | |------------------------------------+--------------+----------------------------------------------------------------------------------------------------------------------------| | DQNTrainer_CartPole-v0_abcc0_00000 | 1 | D:\ML\test_RLlib\results\DQNTrainer_2022-06-10_14-13-00\DQNTrainer_CartPole-v0_abcc0_00000_0_2022-06-10_14-13-00\error.txt | | DQNTrainer_CartPole-v0_abcc0_00001 | 1 | D:\ML\test_RLlib\results\DQNTrainer_2022-06-10_14-13-00\DQNTrainer_CartPole-v0_abcc0_00001_1_2022-06-10_14-13-16\error.txt | | DQNTrainer_CartPole-v0_abcc0_00002 | 1 | D:\ML\test_RLlib\results\DQNTrainer_2022-06-10_14-13-00\DQNTrainer_CartPole-v0_abcc0_00002_2_2022-06-10_14-13-30\error.txt | +------------------------------------+--------------+----------------------------------------------------------------------------------------------------------------------------+ 2022-06-10 14:14:42,464 ERROR ray_trial_executor.py:107 -- An exception occurred when trying to stop the Ray actor:Traceback (most recent call last): File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\tune\ray_trial_executor.py", line 98, in post_stop_cleanup ray.get(future, timeout=0) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper return func(*args, **kwargs) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 1845, in get raise value ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task. class_name: DQNTrainer actor_id: 597648098ad48add1a4d5fd001000000 pid: 11888 namespace: 5da0e462-e686-4c27-bc17-342ca89eed52 ip: 127.0.0.1 The actor is dead because because all references to the actor were removed. (pid=) [2022-06-10 14:14:42,933 E 9364 20076] (gcs_server.exe) gcs_server.cc:294: Failed to get the resource load: GrpcUnavailable: RPC Error message: failed to connect to all addresses; RPC Error details: (pid=) [2022-06-10 14:14:43,949 E 9364 20076] (gcs_server.exe) gcs_server.cc:294: Failed to get the resource load: GrpcUnavailable: RPC Error message: failed to connect to all addresses; RPC Error details: Traceback (most recent call last): File "D:\ML\test_RLlib\test\main.py", line 38, in tune.run(DQNTrainer, scheduler=pbt, File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\tune\tune.py", line 746, in run raise TuneError("Trials did not complete", incomplete_trials) ray.tune.error.TuneError: ('Trials did not complete', [DQNTrainer_CartPole-v0_abcc0_00000, DQNTrainer_CartPole-v0_abcc0_00001, DQNTrainer_CartPole-v0_abcc0_00002]) ``` Error file of each worker are the same: ``` Failure # 1 (occurred at 2022-06-10_14-14-40) Traceback (most recent call last): File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\tune\ray_trial_executor.py", line 934, in get_next_executor_event future_result = ray.get(ready_future) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\_private\client_mode_hook.py", line 105, in wrapper return func(*args, **kwargs) File "D:\ML\test_RLlib\TF_Env\lib\site-packages\ray\worker.py", line 1845, in get raise value ray.exceptions.RayActorError: The actor died unexpectedly before finishing this task. ```

Edit: (mattip) put the error log into a <details> block to hide it

mattip commented 2 years ago

@Peter-P779 did you change anything in the script or install instructions? Which nightly did you use?

Peter-P779 commented 2 years ago

I didn't change anything in the script except deleting "record_env":True. I loaded the environment and executed following commands:

pip uninstall -y ray pip install -U https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp39-cp39-win_amd64.whl

The version is the windows python 3.9 nightly: D:\ML\test_RLlib>ray --version ray, version 2.0.0.dev0

adlerjan commented 2 years ago

Hello all,

I can reproduce the crash on my Windows Desktop on both the current nightly and pypi release.

I stumbled over this Issue investigating an unexpected crash using only Ray Core, which exclusively occurs on my Home Desktop. On other systems (work notebook, high performance cluster, linux notebook) ray works like a charm. Exactly the same thing, afte ~90s of runtime the system crashes with the identical message, sometimes with an access violation error atop at the end.

mattip commented 1 year ago

TL;DR: I could not reproduce. If someone can still reproduce this, please report what you did using the comment below as a template, starting from a vanilla python installation.

And in too much detail:

Here is the script I used

Modified script ``` import ray from ray import tune from ray.rllib.algorithms.dqn.dqn import DQN as DQNTrainer from ray.tune.schedulers import PopulationBasedTraining import gym import random config = { "env":"CartPole-v0", "num_workers":3, "num_gpus": 0, "framework":"tf", } if __name__ == "__main__": pbt = PopulationBasedTraining( time_attr="time_total_s", perturbation_interval=7200, resample_probability=0.25, hyperparam_mutations={ "lr": lambda: random.uniform(1e-3, 5e-5), "gamma": lambda: random.uniform(0.90, 0.99), }, ) import tensorflow as tf ray.init() tune.run(DQNTrainer, scheduler=pbt, config=config, num_samples=3, metric="episode_reward_mean", mode="max", local_dir="./results", sync_config=tune.SyncConfig(syncer=None), checkpoint_freq=500, keep_checkpoints_num=20) ray.shutdown() ```

Here is what I did

CPython39\python.exe -m venv d:\temp\issue24955
d:\temp\issue24955\Scripts\activate
>python -c "import sys; print(sys.version)"
3.9.6 (tags/v3.9.6:db3ff76, Jun 28 2021, 15:26:21) [MSC v.1929 64 bit (AMD64)]
>pip install "ray==2.1,0" "ray[rllib]==2.1.0" "ray[default]==2.1.0" 
>pip install "ray[tune]==2.1.0" "gym==0.23.1" "tensorflow==2.10.0"
>pip install pygame gpuutils pywin32
REM copy script to d:\temp\issue24955.py
> python d:\temp\issue24955.py

I then get a number of diagonostic messages on startup with hints to improve the script

Startup messages ``` 2022-11-15 15:32:12,844 INFO worker.py:1519 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265 d:\temp\issue24955\lib\site-packages\ray\tune\tune.py:523: UserWarning: Consider boosting PBT performance by enabling `reuse_actors` as well as implementing `reset_config` for Trainable. warnings.warn( 2022-11-15 15:32:14,520 WARNING trial_runner.py:1604 -- You are trying to access _search_alg interface of TrialRunner in TrialScheduler, which is being restricted. If you believe it is reasonable for your scheduler to access this TrialRunner API, please reach out to Ray team on GitHub. A more strict API access pattern would be enforced starting 1.12s.0 (DQN pid=10472) 2022-11-15 15:32:19,748 INFO algorithm.py:2303 -- Your framework setting is 'tf', meaning you are using static-graph mode. Set framework='tf2' to enable eager execution with tf2.x. You may also then want to set eager_tracing=True in order to reach similar execution speed as with static-graph mode. (DQN pid=10472) 2022-11-15 15:32:19,748 INFO simple_q.py:307 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting `simple_optimizer=True` if this doesn't work for you. (DQN pid=10472) 2022-11-15 15:32:19,748 INFO algorithm.py:457 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags. (RolloutWorker pid=1228) d:\temp\issue24955\lib\site-packages\gym\envs\registration.py:505: UserWarning: WARN: The environment CartPole-v0 is out of date. You should consider upgrading to version `v1` with the environment ID `CartPole-v1`. (RolloutWorker pid=1228) logger.warn( (RolloutWorker pid=10248) d:\temp\issue24955\lib\site-packages\gym\envs\registration.py:505: UserWarning: WARN: The environment CartPole-v0 is out of date. You should consider upgrading to version `v1` with the environment ID `CartPole-v1`.(RolloutWorker pid=10248) logger.warn( (RolloutWorker pid=6740) d:\temp\issue24955\lib\site-packages\gym\envs\registration.py:505: UserWarning: WARN: The environment CartPole-v0 is out of date. You should consider upgrading to version `v1` with the environment ID `CartPole-v1`. (RolloutWorker pid=6740) logger.warn( (RolloutWorker pid=1228) 2022-11-15 15:32:24,547 WARNING env.py:159 -- Your env reset() method appears to take 'seed' or 'return_info' arguments. Note that these are not yet supported in RLlib. Seeding will take place using 'env.seed()' and the info dict will not be returned from reset. == Status == ```

The script runs, and I can see the resource usage on the dashboard. There are 8 RolloutWorker actors and 2 DQN actors. The processes seem to take up to 14.3GB of RAM. The script runs for much more than 90 seconds: I stopped it after ~10 minutes by pressing CTRL-C, and it stopped cleanly:

2022-11-15 15:42:37,640 ERROR tune.py:773 -- Trials did not complete: [DQN_CartPole-v0_eaa22_00000, DQN_CartPole-v0_eaa22_00001, DQN_CartPole-v0_eaa22_00002]
2022-11-15 15:42:37,640 INFO tune.py:777 -- Total run time: 623.14 seconds (622.83 seconds for the tuning loop).
2022-11-15 15:42:37,640 WARNING tune.py:783 -- Experiment has been interrupted, but the most recent state was saved. You can continue running this experiment by passing `resume=True` to `tune.run()`
adlerjan commented 1 year ago

Hey, sorry for the somewhat unspecific response. It has been a while, but I remember after cross-examination of my working and non-working systems that the issue only occured with a specific Python 3.9 patch version. Switching to a previous patch resolved my problems completely.

mattip commented 1 year ago

Perhaps your machine has 16GB of RAM which is enough on linux but not sufficient on windows to run this experiment.

richardliaw commented 1 year ago

Closing this as we seem to lack a reproduction/may be related to python versioning.