Open zcswdt opened 1 year ago
Hi, thanks for trying out our codebase. I have a few suggestions:
Hi, thanks for trying out our codebase. I have a few suggestions:
- Have you set the CUDA_VISIBLE_DEVICES environment parameter e.g. CUDA_VISIBLE_DEVICES=1,2 python run_sim.py etc.
- I vaguely remember running into an issue like this. You may have to reinstall protobuf/install a different version, or reinstall ray completely.
Thank you very much. I have already passed the test, but it will prompt 'Out of tasks' at the end. May I ask what is the reason for this
(cloth-funnels) zcs@zcs:~/work/github/cloth-funnels$ python cloth_funnels/run_sim.py name="demo-single" load=../models/longsleeve_canonicalized_alignment.pth eval_tasks=../assets/tasks/longsleeve-single.hdf5 eval=True num_processes=1 episode_length=10 wandb=disabled fold_finish=True dump_visualizations=True
2023-08-17 10:36:06,182 INFO worker.py:1544 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
SEEDING WITH 0
[Network] Initializing with inputs: rgb_pos
[Network] Initializing factorized network
[Network] Giving deformable network positional encoding
[Network Setup] Load checkpoint specified ../models/longsleeve_canonicalized_alignment.pth
[Network Setup] Action Exploration Probability: 1.0141e-03
[Network Setup] Value Exploration Probability: 1.0284e-06
[Network Setup] Train Steps: 6216
Replay Buffer path: ../experiments/08-17-1036-demo-single/replay_buffer.hdf5
CUDA_VISIBLE_DEVICES: 0
(TaskLoader pid=9565) [TaskLoader] Loading eval tasks
(SimEnv pid=9566) /home/zcs/miniconda3/envs/cloth-funnels/lib/python3.9/site-packages/pytorch_lightning/utilities/migration/migration.py:201: PossibleUserWarning: You have multiple ModelCheckpoint
callback states in this checkpoint, but we found state keys that would end up colliding with each other after an upgrade, which means we can't differentiate which of your checkpoint callbacks needs which states. At least one of your ModelCheckpoint
callbacks will not be able to reload the state.
(SimEnv pid=9566) rank_zero_warn(
(SimEnv pid=9566) Lightning automatically upgraded your loaded checkpoint from v1.4.0 to v2.0.1. To apply the upgrade to your files permanently, run python -m pytorch_lightning.utilities.upgrade_checkpoint --file ../models/keypoint_model.ckpt
(SimEnv pid=9566) /home/zcs/miniconda3/envs/cloth-funnels/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
(SimEnv pid=9566) warnings.warn(
(SimEnv pid=9566) /home/zcs/miniconda3/envs/cloth-funnels/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or None
for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing weights=None
.
(SimEnv pid=9566) warnings.warn(msg)
(TaskLoader pid=9565) [TaskLoader] 1/2
(SimEnv pid=9566) /home/zcs/miniconda3/envs/cloth-funnels/lib/python3.9/site-packages/torchvision/transforms/functional.py:1603: UserWarning: The default value of the antialias parameter of all the resizing transforms (Resize(), RandomResizedCrop(), etc.) will change from None to True in v0.17, in order to be consistent across the PIL and Tensor backends. To suppress this warning, directly pass antialias=True (recommended, future default), antialias=None (current default, which means False for Tensors and True for PIL), or antialias=False (only works on Tensors - PIL will still use antialiasing). This also applies if you are using the inference transforms from the models weights: update the call to weights.transforms(antialias=True).
(SimEnv pid=9566) warnings.warn(
[RunSim] Stepping env
Starting policy.act()
[Policy] Forward took: 1.6523663997650146 with #obs: 1
[RunSim] Points per hour: 0.0
[RunSim] Stepping env
Starting policy.act()
[Policy] Forward took: 0.31461071968078613 with #obs: 1
[RunSim] Points per hour: 0.0
[RunSim] Stepping env
Starting policy.act()
[Policy] Forward took: 0.3144204616546631 with #obs: 1
[RunSim] Points per hour: 0.0
[RunSim] Stepping env
Starting policy.act()
[Policy] Forward took: 0.31615495681762695 with #obs: 1
[RunSim] Points per hour: 0.0
[RunSim] Stepping env
Starting policy.act()
[Policy] Forward took: 0.3159749507904053 with #obs: 1
[RunSim] Points per hour: 0.0
[RunSim] Stepping env
Starting policy.act()
[Policy] Forward took: 0.3170597553253174 with #obs: 1
[RunSim] Points per hour: 0.0
[RunSim] Stepping env
Starting policy.act()
[Policy] Forward took: 0.316359281539917 with #obs: 1
[RunSim] Points per hour: 0.0
[RunSim] Stepping env
Starting policy.act()
[Policy] Forward took: 0.3157174587249756 with #obs: 1
[RunSim] Points per hour: 0.0
[RunSim] Stepping env
Starting policy.act()
[Policy] Forward took: 0.31556272506713867 with #obs: 1
[RunSim] Points per hour: 0.0
[RunSim] Stepping env
Starting policy.act()
[Policy] Forward took: 0.31675195693969727 with #obs: 1
(SimEnv pid=9566) [SimEnv] Folding cloth
(SimEnv pid=9566) [SimEnv] Folding done
(SimEnv pid=9566) [Episode video dumped]
(SimEnv pid=9566) [Memory] Dumping memory to ../experiments/08-17-1036-demo-single/replay_buffer.hdf5
(TaskLoader pid=9565) [TaskLoader] 2/2
(TaskLoader pid=9565) [TaskLoader] Out of tasks
I installed the basic environment according to the tutorial in the code and successfully installed it. However, during the evaluation process, I encountered the following error. My system is ubuntu18.04, with a graphics card of 2070-8GB.
(cloth-funnels) zcs@zcs:~/work/github/cloth-funnels$ python cloth_funnels/run_sim.py name="longsleeve-eval" load=../models/longsleeve_canonicalized_alignment.pth eval_tasks=../assets/tasks/multi-longsleeve-eval.hdf5 eval=True num_processes=1 episode_length=10 wandb=disabled fold_finish=True dump_visualizations=True 2023-08-16 16:08:33,304 ERROR services.py:1169 -- Failed to start the dashboard , return code 1 2023-08-16 16:08:33,304 ERROR services.py:1194 -- Error should be written to 'dashboard.log' or 'dashboard.err'. We are printing the last 20 lines for you. See 'https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure' to find where the log file is. 2023-08-16 16:08:33,304 ERROR services.py:1238 -- The last 20 lines of /tmp/ray/session_2023-08-16_16-08-31_639412_8491/logs/dashboard.log (it contains the error message from the dashboard): from opencensus.common.transports import sync File "/home/zcs/miniconda3/envs/cloth-funnels/lib/python3.9/site-packages/opencensus/common/transports/sync.py", line 16, in
from opencensus.trace import execution_context
File "/home/zcs/miniconda3/envs/cloth-funnels/lib/python3.9/site-packages/opencensus/trace/init.py", line 15, in
from opencensus.trace.span import Span
File "/home/zcs/miniconda3/envs/cloth-funnels/lib/python3.9/site-packages/opencensus/trace/span.py", line 32, in
from opencensus.trace import status as status_module
File "/home/zcs/miniconda3/envs/cloth-funnels/lib/python3.9/site-packages/opencensus/trace/status.py", line 15, in
from google.rpc import code_pb2
File "/home/zcs/miniconda3/envs/cloth-funnels/lib/python3.9/site-packages/google/rpc/code_pb2.py", line 47, in
_descriptor.EnumValueDescriptor(
File "/home/zcs/miniconda3/envs/cloth-funnels/lib/python3.9/site-packages/google/protobuf/descriptor.py", line 796, in new
_message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot not be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates 2023-08-16 16:08:33,363 INFO worker.py:1553 -- Started a local Ray instance. SEEDING WITH 0 [Network] Initializing with inputs: rgb_pos [Network] Initializing factorized network [Network] Giving deformable network positional encoding (raylet) [2023-08-16 16:08:34,138 E 8788 8900] (raylet) agent_manager.cc:135: The raylet exited immediately because the Ray agent failed. The raylet fate shares with the agent. This can happen because the Ray agent was unexpectedly killed or failed. Agent can fail when (raylet) - The version of
grpcio
doesn't follow Ray's requirement. Agent can segfault with the incorrectgrpcio
version. Check the grpcio versionpip freeze | grep grpcio
. (raylet) - The agent failed to start because of unexpected error or port conflict. Read the logcat /tmp/ray/session_latest/dashboard_agent.log
. You can find the log file structure here https://docs.ray.io/en/master/ray-observability/ray-logging.html#logging-directory-structure. (raylet) - The agent is killed by the OS (e.g., out of memory). [Network Setup] Load checkpoint specified ../models/longsleeve_canonicalized_alignment.pth [Network Setup] Action Exploration Probability: 1.0141e-03 [Network Setup] Value Exploration Probability: 1.0284e-06 [Network Setup] Train Steps: 6216 Replay Buffer path: ../experiments/08-16-1608-longsleeve-eval/replay_buffer.hdf5 Error executing job with overrides: ['name=longsleeve-eval', 'load=../models/longsleeve_canonicalized_alignment.pth', 'eval_tasks=../assets/tasks/multi-longsleeve-eval.hdf5', 'eval=True', 'num_processes=1', 'episode_length=10', 'wandb=disabled', 'fold_finish=True', 'dump_visualizations=True'] Traceback (most recent call last): File "/home/zcs/work/github/cloth-funnels/cloth_funnels/runsim.py", line 114, in main envs, = setup_envs(dataset=dataset_path, **args) File "/home/zcs/work/github/cloth-funnels/cloth_funnels/utils/utils.py", line 133, in setup_envs print("CUDA_VISIBLE_DEVICES: {}".format(os.environ["CUDA_VISIBLE_DEVICES"])) File "/home/zcs/miniconda3/envs/cloth-funnels/lib/python3.9/os.py", line 679, in getitem raise KeyError(key) from None KeyError: 'CUDA_VISIBLE_DEVICES'Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.