rstrivedi / Melting-Pot-Contest-2023

Apache License 2.0
43 stars 67 forks source link

Shape error #5

Closed richielo closed 1 year ago

richielo commented 1 year ago

Hello, I am encountering a shape error when running the training script CUDA_VISIBLE_DEVICES=0 python baselines/train/run_ray_train.py --framework torch --exp al_harvest. Any help is greatly appreciated.

2023-09-04 21:31:18,216 ERROR tune.py:1144 -- Trials did not complete: [PPO_meltingpot_55a86_00000] (PPO pid=6862) File "/home/paperspace/miniconda3/envs/mpc_main/lib/python3.10/site-packages/ray/rllib/models/torch/recurrent_net.py", line 274, in forward_rnn (PPO pid=6862) self._features, [h, c] = self.lstm( (PPO pid=6862) File "/home/paperspace/miniconda3/envs/mpc_main/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl (PPO pid=6862) return forward_call(*args, **kwargs) (PPO pid=6862) File "/home/paperspace/miniconda3/envs/mpc_main/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 810, in forward (PPO pid=6862) self.check_forward_args(input, hx, batch_sizes) (PPO pid=6862) File "/home/paperspace/miniconda3/envs/mpc_main/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 730, in check_forward_args (PPO pid=6862) self.check_input(input, batch_sizes) (PPO pid=6862) File "/home/paperspace/miniconda3/envs/mpc_main/lib/python3.10/site-packages/torch/nn/modules/rnn.py", line 218, in check_input (PPO pid=6862) raise RuntimeError( (PPO pid=6862) RuntimeError: input.size(-1) must be equal to input_size. Expected 147, got 27 (PPO pid=6862) (PPO pid=6862) During handling of the above exception, another exception occurred: (PPO pid=6862) (PPO pid=6862) ray::PPO.__init__() (pid=6862, ip=10.64.34.33, actor_id=afbc4db286cfab682041540a01000000, repr=PPO) (PPO pid=6862) File "/home/paperspace/miniconda3/envs/mpc_main/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 517, in __init__ (PPO pid=6862) super().__init__( (PPO pid=6862) File "/home/paperspace/miniconda3/envs/mpc_main/lib/python3.10/site-packages/ray/tune/trainable/trainable.py", line 169, in __init__ (PPO pid=6862) self.setup(copy.deepcopy(self.config)) (PPO pid=6862) File "/home/paperspace/miniconda3/envs/mpc_main/lib/python3.10/site-packages/ray/rllib/algorithms/algorithm.py", line 639, in setup (PPO pid=6862) self.workers = WorkerSet( (PPO pid=6862) File "/home/paperspace/miniconda3/envs/mpc_main/lib/python3.10/site-packages/ray/rllib/evaluation/worker_set.py", line 179, in __init__ (PPO pid=6862) raise e.args[0].args[2] (PPO pid=6862) RuntimeError: input.size(-1) must be equal to input_size. Expected 147, got 27

rstrivedi commented 1 year ago

Thank you for your question.

Quick check: Did you run sh ray_patch.sh when doing the installation?

Thanks

richielo commented 1 year ago

Yes I have ! Thank you for your prompt response

rstrivedi commented 1 year ago

Ok thanks. I assume you have not made any change in setup script. Could you also check if the patch was actually applied i.e. the file ray/rllib/models/torch/complex_input_net.py has line 181 replaced where your rllib is installed in your environment? This works fine for me so seems something is going wrong with the setup (e.g. if there was any issue due to permission problem in applying the patch).

richielo commented 1 year ago

You are right. The python path was pointing to somewhere else. Appreciate the very quick help!