Closed lhorus closed 3 years ago
Hm, is it possible to reproduce this in a self contained script or without the custom env?
Also the checkpoint was created with the same version of ray right?
I will try to reproduce it, as at the time it was generated via AWS Sagemaker's wrapper - which answers the question, it is indeed the same Ray version - and I'll re-run it via direct execution.
I'll let you know as soon as I have progress.
@ericl , I re-ran the test on an EC2 instance - no Sagemaker wrappers this time - , and the error seems to have changed:
AttributeError Traceback (most recent call last)
~/thesis/basel_on_ray/rollout.py in <module>()
160 parser = create_parser()
161 args = parser.parse_args()
--> 162 run(args, parser)
~/thesis/basel_on_ray/rollout.py in run(args, parser)
104
105 cls = get_agent_class(args.run)
--> 106 agent = cls(env=args.env, config=config)
107 print(args.checkpoint)
108 agent.restore(args.checkpoint)
~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/agent.py in __init__(self, config, env, logger_creator)
246 logger_creator = default_logger_creator
247
--> 248 Trainable.__init__(self, config, logger_creator)
249
250 @classmethod
~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/tune/trainable.py in __init__(self, config, logger_creator)
86 self._iterations_since_restore = 0
87 self._restored = False
---> 88 self._setup(copy.deepcopy(self.config))
89 self._local_ip = ray.services.get_node_ip_address()
90
~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/agent.py in _setup(self, config)
316 # TODO(ekl) setting the graph is unnecessary for PyTorch agents
317 with tf.Graph().as_default():
--> 318 self._init()
319
320 @override(Trainable)
~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/ppo/ppo.py in _init(self)
73 self._validate_config()
74 self.local_evaluator = self.make_local_evaluator(
---> 75 self.env_creator, self._policy_graph)
76 self.remote_evaluators = self.make_remote_evaluators(
77 self.env_creator, self._policy_graph, self.config["num_workers"])
~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/agent.py in make_local_evaluator(self, env_creator, policy_graph)
436 merge_dicts(self.config, {
437 "tf_session_args": self.
--> 438 config["local_evaluator_tf_session_args"]
439 }))
440
~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/agent.py in _make_evaluator(self, cls, env_creator, policy_graph, worker_index, config)
576 input_creator=input_creator,
577 input_evaluation_method=config["input_evaluation"],
--> 578 output_creator=output_creator)
579
580 def __getstate__(self):
~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/evaluation/policy_evaluator.py in __init__(self, env_creator, policy_graph, policy_mapping_fn, policies_to_train, tf_session_creator, batch_steps, batch_mode, episode_horizon, preprocessor_pref, sample_async, compress_observations, num_envs, observation_filter, clip_rewards, clip_actions, env_config, model_config, policy_config, worker_index, monitor_path, log_dir, log_level, callbacks, input_creator, input_evaluation_method, output_creator)
255 if _has_tensorflow_graph(policy_dict):
256 if (ray.worker._mode() != ray.worker.LOCAL_MODE
--> 257 and not ray.get_gpu_ids()):
258 logger.info("Creating policy evaluation worker {}".format(
259 worker_index) +
~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/worker.py in get_gpu_ids()
1000 "MODE.")
1001
-> 1002 all_resource_ids = global_worker.raylet_client.resource_ids()
1003 assigned_ids = [
1004 resource_id for resource_id, _ in all_resource_ids.get("GPU", [])
AttributeError: 'Worker' object has no attribute 'raylet_client'
That means your ray install is out of sync with the repo you have checked out.
Make sure your code is in sync with master, and pip install the latest wheels as well: https://ray.readthedocs.io/en/latest/installation.html
I have done as recommended, which makes sense, still updating both the main repo and wheels via:
!pip install gym
!pip install -U ray
!pip install -U ray[debug]
!pip install https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.6.2-cp36-cp36m-manylinux1_x86_64.whl
!pip install opencv-python
Yields the same result. On the previous post I forgot to add the cell magic output, which is as follows:
2019-02-05 21:48:08,712 WARNING ppo.py:137 -- By default, observations will be normalized with MeanStdFilter Exception ignored in: <bound method PolicyEvaluator.del of <ray.rllib.evaluation.policy_evaluator.PolicyEvaluator object at 0x7f8fa53a8b38>> Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/evaluation/policy_evaluator.py", line 598, in del if isinstance(self.sampler, AsyncSampler): AttributeError: 'PolicyEvaluator' object has no attribute 'sampler'
Again, thanks for the patience @ericl .
Any updates on this? I'm facing the same issue on an AWS EC2 instance.
`--------------------------------------------------------------------------- AttributeError Traceback (most recent call last)
i am facing below error in sagemaker .Can someone help me with this predictions = [] for item in np.array(vectors.todense()): np.shape(item) results = ntm_predictor.predict(item) predictions.append(np.array([prediction['topic_weights'] for prediction in results['predictions']]))
predictions = np.array([np.ndarray.flatten(x) for x in predictions]) topicvec = train_labels[newidx] topicnames = [categories[x] for x in topicvec]
TypeError Traceback (most recent call last)
Hi, I'm a bot from the Ray team :)
To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.
If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel.
Running rollouts.py yields a strange error, upon attempting to restore the agent via
agent.restore(args.checkpoint)
raises aerror. Specifically, at
~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/tune/trainable.py
the error is thrown:The rollouts.py has been copied from the main repository, with a small change: registering the custom environment.