nickgkan / 3d_diffuser_actor

Code for the paper "3D Diffuser Actor: Policy Diffusion with 3D Scene Representations"
https://3d-diffuser-actor.github.io/
MIT License
199 stars 24 forks source link

evaluation on calvin dataset #13

Closed weiheng-liu closed 6 months ago

weiheng-liu commented 6 months ago

hi~ Thanks for your great work! I meet a problem when I run the scrpts test_trajectory_calvin.sh, but I have change th "ngpu" to 1 when I run this script, it seems like something wrong about daraprocessing? This is the error: Exception: Unable to add DataPipe function name sharding_filter as it is already taken ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 54756) of binary: /home/jaylonw42/.conda/envs/3d_diffuser_actor/bin/python

Traceback (most recent call last): File "/home/jaylonw42/.conda/envs/3d_diffuser_actor/bin/torchrun", line 33, in sys.exit(load_entry_point('torch==1.13.1', 'console_scripts', 'torchrun')()) File "/home/jaylonw42/.conda/envs/3d_diffuser_actor/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(*args, **kwargs) File "/home/jaylonw42/.conda/envs/3d_diffuser_actor/lib/python3.8/site-packages/torch/distributed/run.py", line 762, in main run(args) File "/home/jaylonw42/.conda/envs/3d_diffuser_actor/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run elastic_launch( File "/home/jaylonw42/.conda/envs/3d_diffuser_actor/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/jaylonw42/.conda/envs/3d_diffuser_actor/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

online_evaluation_calvin/evaluate_policy.py FAILED

Failures:

------------------------------------------------------------ Root Cause (first observed failure): [0]: time : 2024-03-18_21:12:08 host : ubuntu rank : 0 (local_rank: 0) exitcode : 1 (pid: 54756) error_file: traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html