jhejna / few-shot-preference-rl

MIT License
27 stars 4 forks source link

No such file or directory: 'sbatch' #2

Closed NagisaZj closed 1 year ago

NagisaZj commented 1 year ago

Hi, I am trying to reproduce your results. I setup the conda env as described in Readme, and tried the data collection command provided in readme: python tools/run_slurm.py --partition 1 --cpus 12 --mem 32G --job-name ml10-dataset --entry-point scripts/metaworld/collect_policy_dataset.py --arguments benchmark=ml10 tasks-per-env=25 cross-env-ep=10 within-env-ep=25 expert-ep=15 random-ep=2 epsilon=0.1 num-workers=10 noise-type=gaussian path=datasets/mw

It gives the following error: Launching job with slurm configuration: /tmp/job_po5r2xu8.sh Traceback (most recent call last): File "tools/run_slurm.py", line 121, in <module> proc = subprocess.Popen(["sbatch", slurm_file]) File "/home/zj/anaconda3/envs/few-shot-pref-rl/lib/python3.8/subprocess.py", line 858, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File "/home/zj/anaconda3/envs/few-shot-pref-rl/lib/python3.8/subprocess.py", line 1704, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'sbatch'

How can I fix this? Am I missing any packages? Thanks.

jhejna commented 1 year ago

Hi!

Thanks for your interest in our work. The command listed is designed for use with a SLURM compute cluster. It seems like you aren't using SLURM.

Please try replacing tools/run_slurm.py with tools/run_local.py and updating the command arguments accordingly. This repo inherits from an older version of research lightning. You can read more about the slurm launcher there.

NagisaZj commented 1 year ago

Thanks for your quick response! run_local seems to work, but another error occurs when I run the following command: python tools/run_local.py --cpus 1 --entry-point scripts/metaworld/collect_policy_dataset.py --arguments benchmark=ml10 tasks-per-env=25 cross-env-ep=10 within-env-ep=25 expert-ep=15 random-ep=2 epsilon=0.1 num-workers=10 noise-type=gaussian path=datasets/mw It gives: Traceback (most recent call last): File "scripts/metaworld/collect_policy_dataset.py", line 254, in <module> collect_dataset( File "scripts/metaworld/collect_policy_dataset.py", line 141, in collect_dataset src_env, policy = sample_policy(benchmark, train=train, env_name=None) File "scripts/metaworld/collect_policy_dataset.py", line 88, in sample_policy env.set_task(Task(env_name=env_name, data=pickle.dumps(data))) File "/data3/zj/few-shot-preference-rl/src/metaworld/metaworld/envs/mujoco/sawyer_xyz/sawyer_xyz_env.py", line 162, in set_task assert isinstance(self, data['env_cls']) AssertionError

I suspect it might be related to the version of Meta-World, as it gets updated frequently. Can you check the Meta-World commit you use for your codes?

jhejna commented 1 year ago

Hi!

You can fix the error by following the metaworld setup instructions in the README. Specifically, the code doesn't work because ML10 has a mismatch between the environment names used and the actual environment classes.

In lib/python3.11/site-packages/metaworld/envs/mujoco/env_dict.py under ML10 at around line 391 change ('button-press-topdown-v2', SawyerButtonPressEnvV2) to ('button-press-topdown-v2', SawyerButtonPressTopdownEnvV2). This makes sure ML10 uses the topdown button press env, which is aligned with the name, and also ensures that we do not train on a test environment.

I've tested the code after making this change myself and it works.

NagisaZj commented 1 year ago

Thanks a lot. Can't believe Meta-World makes such a mistake :).