[rllib]: a bug in model sampling from MBMPO's model ensemble with multiple workers

What happened + What you expected to happen

Training with MBMPO using multiple workers gives the following error: (MBMPOTrainer pid=98221) next_obs_batch = self.model.predict_model_batches( (MBMPOTrainer pid=98221) File "/home/jones/anaconda3/envs/ray_latest/lib/python3.8/site-packages/ray/rllib/agents/mbmpo/model_ensemble.py", line 350, in predict_model_batches (MBMPOTrainer pid=98221) delta = self.forward(x).detach().cpu().numpy() (MBMPOTrainer pid=98221) File "/home/jones/anaconda3/envs/ray_latest/lib/python3.8/site-packages/ray/rllib/agents/mbmpo/model_ensemble.py", line 187, in forward (MBMPOTrainer pid=98221) return self.dynamics_ensemble[self.sample_index](x) (MBMPOTrainer pid=98221) IndexError: list index out of range

By debugging, it seems that the model to use from the ensemble is chosen based on : self.sample_index = int((worker_index - 1) / self.num_models)

Then used in forward with: self.dynamics_ensemble[self.sample_index](x)

So if there is only one model self.num_models = 1 but multiple workers worker_index > 1, say 10; The sample index would be 10, so self.dynamics_ensemble[self.sample_index = 10] would throw an error.

a quick fix would be to make it as : self.sample_index = (worker_index - 1) % self.num_models

but there is a comment saying # For each worker, choose a random model to choose trajectories from so it should be the case.

Versions / Dependencies

Version 1.12 but it is still there in the current version as well

Reproduction script

Try with an example but change the spec to have multiple workers and multiple models

Issue Severity

High: It blocks me from completing my task.

ray-project / ray