ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.14k stars 5.8k forks source link

Error using Rollout metadata "TypeError: list indices must be integers or slices, not str" #3940

Closed lhorus closed 3 years ago

lhorus commented 5 years ago

Running rollouts.py yields a strange error, upon attempting to restore the agent via agent.restore(args.checkpoint) raises a

TypeError: list indices must be integers or slices, not str

error. Specifically, at ~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/tune/trainable.py the error is thrown:

~/test/basel_on_ray/rollout.py in run(args, parser)
    105     agent = cls(env=args.env, config=config)
    106     print(args.checkpoint)
--> 107     agent.restore(args.checkpoint)
    108     num_steps = int(args.steps)
    109     rollout(agent, args.env, num_steps, args.out, args.no_render)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/tune/trainable.py in restore(self, checkpoint_path)
    299 
    300         metadata = pickle.load(open(checkpoint_path + ".tune_metadata", "rb"))
--> 301         self._experiment_id = metadata["experiment_id"]
    302         self._iteration = metadata["iteration"]
    303         self._timesteps_total = metadata["timesteps_total"]

TypeError: list indices must be integers or slices, not str

The rollouts.py has been copied from the main repository, with a small change: registering the custom environment.

ericl commented 5 years ago

Hm, is it possible to reproduce this in a self contained script or without the custom env?

Also the checkpoint was created with the same version of ray right?

lhorus commented 5 years ago

I will try to reproduce it, as at the time it was generated via AWS Sagemaker's wrapper - which answers the question, it is indeed the same Ray version - and I'll re-run it via direct execution.

I'll let you know as soon as I have progress.

lhorus commented 5 years ago

@ericl , I re-ran the test on an EC2 instance - no Sagemaker wrappers this time - , and the error seems to have changed:

AttributeError                            Traceback (most recent call last)
~/thesis/basel_on_ray/rollout.py in <module>()
    160     parser = create_parser()
    161     args = parser.parse_args()
--> 162     run(args, parser)

~/thesis/basel_on_ray/rollout.py in run(args, parser)
    104 
    105     cls = get_agent_class(args.run)
--> 106     agent = cls(env=args.env, config=config)
    107     print(args.checkpoint)
    108     agent.restore(args.checkpoint)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/agent.py in __init__(self, config, env, logger_creator)
    246             logger_creator = default_logger_creator
    247 
--> 248         Trainable.__init__(self, config, logger_creator)
    249 
    250     @classmethod

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/tune/trainable.py in __init__(self, config, logger_creator)
     86         self._iterations_since_restore = 0
     87         self._restored = False
---> 88         self._setup(copy.deepcopy(self.config))
     89         self._local_ip = ray.services.get_node_ip_address()
     90 

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/agent.py in _setup(self, config)
    316         # TODO(ekl) setting the graph is unnecessary for PyTorch agents
    317         with tf.Graph().as_default():
--> 318             self._init()
    319 
    320     @override(Trainable)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/ppo/ppo.py in _init(self)
     73         self._validate_config()
     74         self.local_evaluator = self.make_local_evaluator(
---> 75             self.env_creator, self._policy_graph)
     76         self.remote_evaluators = self.make_remote_evaluators(
     77             self.env_creator, self._policy_graph, self.config["num_workers"])

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/agent.py in make_local_evaluator(self, env_creator, policy_graph)
    436             merge_dicts(self.config, {
    437                 "tf_session_args": self.
--> 438                 config["local_evaluator_tf_session_args"]
    439             }))
    440 

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/agents/agent.py in _make_evaluator(self, cls, env_creator, policy_graph, worker_index, config)
    576             input_creator=input_creator,
    577             input_evaluation_method=config["input_evaluation"],
--> 578             output_creator=output_creator)
    579 
    580     def __getstate__(self):

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/evaluation/policy_evaluator.py in __init__(self, env_creator, policy_graph, policy_mapping_fn, policies_to_train, tf_session_creator, batch_steps, batch_mode, episode_horizon, preprocessor_pref, sample_async, compress_observations, num_envs, observation_filter, clip_rewards, clip_actions, env_config, model_config, policy_config, worker_index, monitor_path, log_dir, log_level, callbacks, input_creator, input_evaluation_method, output_creator)
    255         if _has_tensorflow_graph(policy_dict):
    256             if (ray.worker._mode() != ray.worker.LOCAL_MODE
--> 257                     and not ray.get_gpu_ids()):
    258                 logger.info("Creating policy evaluation worker {}".format(
    259                     worker_index) +

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/worker.py in get_gpu_ids()
   1000                         "MODE.")
   1001 
-> 1002     all_resource_ids = global_worker.raylet_client.resource_ids()
   1003     assigned_ids = [
   1004         resource_id for resource_id, _ in all_resource_ids.get("GPU", [])

AttributeError: 'Worker' object has no attribute 'raylet_client'
ericl commented 5 years ago

That means your ray install is out of sync with the repo you have checked out.

Make sure your code is in sync with master, and pip install the latest wheels as well: https://ray.readthedocs.io/en/latest/installation.html

lhorus commented 5 years ago

I have done as recommended, which makes sense, still updating both the main repo and wheels via:

!pip install gym
!pip install -U ray
!pip install -U ray[debug]
!pip install https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-0.6.2-cp36-cp36m-manylinux1_x86_64.whl
!pip install opencv-python

Yields the same result. On the previous post I forgot to add the cell magic output, which is as follows:

2019-02-05 21:48:08,712 WARNING ppo.py:137 -- By default, observations will be normalized with MeanStdFilter Exception ignored in: <bound method PolicyEvaluator.del of <ray.rllib.evaluation.policy_evaluator.PolicyEvaluator object at 0x7f8fa53a8b38>> Traceback (most recent call last): File "/home/ubuntu/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/ray/rllib/evaluation/policy_evaluator.py", line 598, in del if isinstance(self.sampler, AsyncSampler): AttributeError: 'PolicyEvaluator' object has no attribute 'sampler'

Again, thanks for the patience @ericl .

Austrie commented 5 years ago

Any updates on this? I'm facing the same issue on an AWS EC2 instance.

`--------------------------------------------------------------------------- AttributeError Traceback (most recent call last)

in () 8 # combat this, I make the blocksize much large (500MBs), so there's less chance of the CSV being split improperly, since this is 9 # a 5GB file, it will only have around 10 CSVs, compared to 100 CSVs if we use a small number like "50e6" (50MBs) ---> 10 rated_anime_df = pd.read_csv(directory + '/UserAnimeList.csv') /home/ubuntu/anaconda2/lib/python2.7/site-packages/modin/pandas/io.pyc in parser_func(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision) 94 ): 95 _, _, _, kwargs = inspect.getargvalues(inspect.currentframe()) ---> 96 return _read(**kwargs) 97 98 return parser_func /home/ubuntu/anaconda2/lib/python2.7/site-packages/modin/pandas/io.pyc in _read(**kwargs) 107 kwargs: Keyword arguments in pandas.read_csv 108 """ --> 109 pd_obj = BaseFactory.read_csv(**kwargs) 110 # This happens when `read_csv` returns a TextFileReader object for iterating through 111 if isinstance(pd_obj, pandas.io.parsers.TextFileReader): /home/ubuntu/anaconda2/lib/python2.7/site-packages/modin/data_management/factories.pyc in read_csv(cls, **kwargs) 53 @classmethod 54 def read_csv(cls, **kwargs): ---> 55 return cls._determine_engine()._read_csv(**kwargs) 56 57 @classmethod /home/ubuntu/anaconda2/lib/python2.7/site-packages/modin/data_management/factories.pyc in _read_csv(cls, **kwargs) 57 @classmethod 58 def _read_csv(cls, **kwargs): ---> 59 return cls.io_cls.read_csv(**kwargs) 60 61 @classmethod /home/ubuntu/anaconda2/lib/python2.7/site-packages/modin/engines/ray/pandas_on_ray/io.pyc in read_csv(cls, filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision) 361 "float_precision": float_precision, 362 } --> 363 return cls._read(**kwargs) 364 365 @classmethod /home/ubuntu/anaconda2/lib/python2.7/site-packages/modin/engines/ray/pandas_on_ray/io.pyc in _read(cls, filepath_or_buffer, **kwargs) 431 else: 432 return cls._read_csv_from_file_pandas_on_ray( --> 433 filepath_or_buffer, filtered_kwargs 434 ) 435 /home/ubuntu/anaconda2/lib/python2.7/site-packages/modin/engines/ray/pandas_on_ray/io.pyc in _read_csv_from_file_pandas_on_ray(cls, filepath, kwargs) 172 f.seek(0, os.SEEK_SET) # Return to beginning of file 173 --> 174 prefix_id = ray.put(prefix) 175 partition_kwargs_id = ray.put(partition_kwargs) 176 # Skip the header since we already have the header information and skip the /home/ubuntu/anaconda2/lib/python2.7/site-packages/ray/worker.pyc in put(value, worker) 2227 # In LOCAL_MODE, ray.put is the identity operation. 2228 return value -> 2229 object_id = worker.raylet_client.compute_put_id( 2230 worker.current_task_id, 2231 worker.task_context.put_index, AttributeError: 'Worker' object has no attribute 'raylet_client'`
PratimaGupta commented 4 years ago

i am facing below error in sagemaker .Can someone help me with this predictions = [] for item in np.array(vectors.todense()): np.shape(item) results = ntm_predictor.predict(item) predictions.append(np.array([prediction['topic_weights'] for prediction in results['predictions']]))

predictions = np.array([np.ndarray.flatten(x) for x in predictions]) topicvec = train_labels[newidx] topicnames = [categories[x] for x in topicvec]

Error-----------

TypeError Traceback (most recent call last)

in 3 np.shape(item) 4 results = ntm_predictor.predict(item) ----> 5 predictions.append(np.array([prediction['topic_weights'] for prediction in results['predictions']])) 6 7 predictions = np.array([np.ndarray.flatten(x) for x in predictions]) TypeError: list indices must be integers or slices, not str
stale[bot] commented 3 years ago

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

You can always ask for help on our discussion forum or Ray's public slack channel.