Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
Training with MBMPO using multiple workers gives the following error:
(MBMPOTrainer pid=98221) next_obs_batch = self.model.predict_model_batches( (MBMPOTrainer pid=98221) File "/home/jones/anaconda3/envs/ray_latest/lib/python3.8/site-packages/ray/rllib/agents/mbmpo/model_ensemble.py", line 350, in predict_model_batches (MBMPOTrainer pid=98221) delta = self.forward(x).detach().cpu().numpy() (MBMPOTrainer pid=98221) File "/home/jones/anaconda3/envs/ray_latest/lib/python3.8/site-packages/ray/rllib/agents/mbmpo/model_ensemble.py", line 187, in forward (MBMPOTrainer pid=98221) return self.dynamics_ensemble[self.sample_index](x) (MBMPOTrainer pid=98221) IndexError: list index out of range
By debugging, it seems that the model to use from the ensemble is chosen based on :
self.sample_index = int((worker_index - 1) / self.num_models)
Then used in forward with:
self.dynamics_ensemble[self.sample_index](x)
So if there is only one model self.num_models = 1 but multiple workers worker_index > 1, say 10; The sample index would be 10, so self.dynamics_ensemble[self.sample_index = 10] would throw an error.
a quick fix would be to make it as :
self.sample_index = (worker_index - 1) % self.num_models
but there is a comment saying # For each worker, choose a random model to choose trajectories from so it should be the case.
Versions / Dependencies
Version 1.12
but it is still there in the current version as well
Reproduction script
Try with an example but change the spec to have multiple workers and multiple models
What happened + What you expected to happen
Training with MBMPO using multiple workers gives the following error:
(MBMPOTrainer pid=98221) next_obs_batch = self.model.predict_model_batches( (MBMPOTrainer pid=98221) File "/home/jones/anaconda3/envs/ray_latest/lib/python3.8/site-packages/ray/rllib/agents/mbmpo/model_ensemble.py", line 350, in predict_model_batches (MBMPOTrainer pid=98221) delta = self.forward(x).detach().cpu().numpy() (MBMPOTrainer pid=98221) File "/home/jones/anaconda3/envs/ray_latest/lib/python3.8/site-packages/ray/rllib/agents/mbmpo/model_ensemble.py", line 187, in forward (MBMPOTrainer pid=98221) return self.dynamics_ensemble[self.sample_index](x) (MBMPOTrainer pid=98221) IndexError: list index out of range
By debugging, it seems that the model to use from the ensemble is chosen based on :
self.sample_index = int((worker_index - 1) / self.num_models)
Then used in forward with:
self.dynamics_ensemble[self.sample_index](x)
So if there is only one model
self.num_models = 1
but multiple workers worker_index > 1, say 10; The sample index would be 10, so self.dynamics_ensemble[self.sample_index = 10] would throw an error.a quick fix would be to make it as :
self.sample_index = (worker_index - 1) % self.num_models
but there is a comment saying
# For each worker, choose a random model to choose trajectories from
so it should be the case.Versions / Dependencies
Version 1.12 but it is still there in the current version as well
Reproduction script
Try with an example but change the spec to have multiple workers and multiple models
Issue Severity
High: It blocks me from completing my task.