xuanlinli17 / corl_22_frame_mining

[CoRL22] Frame Mining - a Free Lunch for Learning Robotic Manipulation from 3D Point Clouds
https://colin97.github.io/FrameMining/
Apache License 2.0
25 stars 1 forks source link

Issue: AttributeError: 'VectorEnv' object has no attribute 'workers' #1

Closed laolianlaile closed 1 year ago

laolianlaile commented 1 year ago

After creating the required environment, I tried sh ./script/ppo_FM_MA.sh, but encountered with an issue below:

(frame_mining) lr@lr:~/corl_22_frame_mining-main/pyrl$ sh ./script/ppo_FM_MA.sh
MoveBucket-v0-train - (run_rl.py:269) - INFO - 2022-10-31,17:49:59 - Config:
agent_cfg = dict(
    type='PPO',
    gamma=0.95,
    lmbda=0.95,
    critic_coeff=1,
    entropy_coeff=0,
    critic_clip=False,
    obs_norm=False,
    rew_norm=True,
    adv_norm=True,
    recompute_value=True,
    num_epoch=2,
    critic_warmup_epoch=4,
    batch_size=330,
    detach_actor_feature=False,
    max_grad_norm=0.5,
    eps_clip=0.2,
    max_kl=0.2,
    dual_clip=None,
    shared_backbone=True,
    actor_cfg=dict(
        type='ContinuousActor',
        head_cfg=dict(
            type='GaussianHead',
            init_log_std=-1,
            clip_return=True,
            predict_std=False),
        nn_cfg=dict(
            type='VisuomotorTransformerFrame',
            visual_nn_cfg=dict(
                type='TransformerFrame',
                num_frames='nhand + 1',
                backbone_cfg=dict(
                    type='PointNet',
                    feat_dim='pcd_all_channel',
                    mlp_spec=[64, 128, 300]),
                transformer_cfg=dict(
                    type='TransformerEncoder',
                    block_cfg=dict(
                        attention_cfg=dict(
                            type='MultiHeadSelfAttention',
                            embed_dim=300,
                            num_heads=8,
                            latent_dim=32,
                            dropout=0.1),
                        mlp_cfg=dict(
                            type='LinearMLP',
                            norm_cfg=None,
                            mlp_spec=[300, 1024, 300],
                            bias='auto',
                            inactivated_output=True,
                            linear_init_cfg=dict(
                                type='xavier_init', gain=1, bias=0)),
                        dropout=0.1),
                    mlp_cfg=None,
                    num_blocks=3),
                mask_type='skip'),
            mlp_cfg=dict(
                type='LinearMLP',
                norm_cfg=None,
                mlp_spec=['300 + agent_shape', 192, 128],
                inactivated_output=True,
                zero_init_output=True),
            is_value=False,
            mix_action=dict(
                type='LinearMLP',
                norm_cfg=None,
                mlp_spec=['300 + agent_shape', 192],
                inactivated_output=True,
                zero_init_output=True)),
        optim_cfg=dict(type='Adam', lr=0.0003)),
    critic_cfg=dict(
        type='ContinuousCritic',
        nn_cfg=dict(
            type='VisuomotorTransformerFrame',
            visual_nn_cfg=None,
            mlp_cfg=dict(
                type='LinearMLP',
                norm_cfg=None,
                mlp_spec=['300 + agent_shape', 192, 128, 1],
                inactivated_output=True,
                zero_init_output=True),
            is_value=True,
            mix_action=True),
        optim_cfg=dict(type='Adam', lr=0.0003)))
env_cfg = dict(
    type='gym',
    env_name='MoveBucket-v0',
    unwrapped=False,
    obs_mode='pointcloud',
    with_ext_torque=True,
    no_early_stop=True,
    cos_sin_representation=True,
    reward_scale=0.3,
    ego_mode=True,
    with_mask=True,
    nhand_pose=2,
    process_mode='base',
    device='cuda:0')
rollout_cfg = dict(
    type='Rollout',
    num_procs=5,
    sync=True,
    shared_memory=True,
    with_info=False)
replay_cfg = dict(
    type='ReplayMemory',
    capacity=40000,
    sampling_cfg=dict(type='OneStepTransition', with_replacement=False))
train_rl_cfg = dict(
    on_policy=True,
    warm_steps=0,
    total_steps=15000000,
    n_steps=40000,
    n_eval=15000000,
    n_checkpoint=2000000)
eval_cfg = dict(
    type='Evaluation',
    num=100,
    num_procs=1,
    use_hidden_state=False,
    start_state=None,
    save_traj=True,
    save_video=True,
    use_log=True,
    debug_print=False,
    env_cfg=dict(no_early_stop=False))
work_dir = None
resume_from = None

MoveBucket-v0-train - (run_rl.py:270) - INFO - 2022-10-31,17:49:59 - Set random seed to 1252483472
MoveBucket-v0-train - (run_rl.py:273) - INFO - 2022-10-31,17:49:59 - Build replay buffer!
MoveBucket-v0-train - (run_rl.py:283) - INFO - 2022-10-31,17:49:59 - Build rollout!
Traceback (most recent call last):
  File "tools/run_rl.py", line 380, in <module>
    main()
  File "tools/run_rl.py", line 353, in main
    run_one_process(0, 1, args, cfg)
  File "tools/run_rl.py", line 286, in run_one_process
    rollout = build_rollout(rollout_cfg)
  File "/home/lr/corl_22_frame_mining-main/pyrl/pyrl/env/builder.py", line 15, in build_rollout
    return build_from_cfg(cfg, ROLLOUTS, default_args)
  File "/home/lr/corl_22_frame_mining-main/pyrl/pyrl/utils/meta/registry.py", line 136, in build_from_cfg
    return obj_cls(**args)
  File "/home/lr/corl_22_frame_mining-main/pyrl/pyrl/env/rollout.py", line 38, in __init__
    self.env = build_env(env_cfg, num_procs, single_procs=single_procs, **kwargs)
  File "/home/lr/corl_22_frame_mining-main/pyrl/pyrl/env/env_utils.py", line 187, in build_env
    return VectorEnv(cfgs, **vec_env_kwargs)
  File "/home/lr/corl_22_frame_mining-main/pyrl/pyrl/env/vec_env.py", line 32, in __init__
    example_env = build_single_env(env_cfgs[0])
  File "/home/lr/corl_22_frame_mining-main/pyrl/pyrl/env/env_utils.py", line 172, in build_single_env
    return build_from_cfg(cfg, ENVS)
  File "/home/lr/corl_22_frame_mining-main/pyrl/pyrl/utils/meta/registry.py", line 136, in build_from_cfg
    return obj_cls(**args)
  File "/home/lr/corl_22_frame_mining-main/pyrl/pyrl/env/env_utils.py", line 99, in make_gym_env
    env_type = get_gym_env_type(env_name)
  File "/home/lr/corl_22_frame_mining-main/pyrl/pyrl/env/env_utils.py", line 31, in get_gym_env_type
    raise ValueError("No such env")
ValueError: No such env
Exception ignored in: <function VectorEnv.__del__ at 0x7fe8adb8e4c0>
Traceback (most recent call last):
  File "/home/lr/corl_22_frame_mining-main/pyrl/pyrl/env/vec_env.py", line 213, in __del__
    for worker in self.workers:
AttributeError: 'VectorEnv' object has no attribute 'workers'

Could you tell me what to do next? Thank u. Same AttributeError when I try TG, base and ee.

xuanlinli17 commented 1 year ago

The actual error is ValueError: No such env

Did you enter mani_skill and pip install there?

laolianlaile commented 1 year ago

Yeah, I did installed. And I tried to re-install it but here came the same error

(frame_mining) lr@lr:~/corl_22_frame_mining-main$ cd mani_skill/
(frame_mining) lr@lr:~/corl_22_frame_mining-main/mani_skill$ pip install e .
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Processing /home/lr/corl_22_frame_mining-main/mani_skill
  Preparing metadata (setup.py) ... done
Collecting e
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/79/19/8bdbb33a50c0a76eac690ecad9add56e1de1b08c657ac0faa862b7662be6/e-1.4.5.tar.gz (1.8 kB)
  Preparing metadata (setup.py) ... done
Building wheels for collected packages: e, mani-skill
  Building wheel for e (setup.py) ... done
  Created wheel for e: filename=e-1.4.5-py3-none-any.whl size=2792 sha256=a41458e8336e786863a5fa2991f8820bf70e4ff3f3dd23eb92d0d62e4d1a6633
  Stored in directory: /home/lr/.cache/pip/wheels/1b/9b/8f/3dfcf147f389a3f4f07e993003113c9eef9182a6e970757ff1
  Building wheel for mani-skill (setup.py) ... done
  Created wheel for mani-skill: filename=mani_skill-0.1.0-py3-none-any.whl size=1168 sha256=963567d76299caea6bdd6e129f22a3824e3d389e00a59fbfcc9a512705aee707
  Stored in directory: /tmp/pip-ephem-wheel-cache-qpzv78_s/wheels/1a/55/fd/1510de5d3d887d0450f59ff0171fb1fe4a13dcc148ea0323dc
Successfully built e mani-skill
Installing collected packages: mani-skill, e
  Attempting uninstall: mani-skill
    Found existing installation: mani-skill 0.1.0
    Uninstalling mani-skill-0.1.0:
      Successfully uninstalled mani-skill-0.1.0
Successfully installed e-1.4.5 mani-skill-0.1.0
(frame_mining) lr@lr:~/corl_22_frame_mining-main/mani_skill$ cd ..
(frame_mining) lr@lr:~/corl_22_frame_mining-main$ cd pyrl/
(frame_mining) lr@lr:~/corl_22_frame_mining-main/pyrl$ sh ./script/ppo_FM_MA.sh 
MoveBucket-v0-train - (run_rl.py:269) - INFO - 2022-11-01,09:18:01 - Config:
agent_cfg = dict(
    type='PPO',
    gamma=0.95,
    lmbda=0.95,
    critic_coeff=1,
    entropy_coeff=0,
    critic_clip=False,
    obs_norm=False,
    rew_norm=True,
    adv_norm=True,
    recompute_value=True,
    num_epoch=2,
    critic_warmup_epoch=4,
    batch_size=330,
    detach_actor_feature=False,
    max_grad_norm=0.5,
    eps_clip=0.2,
    max_kl=0.2,
    dual_clip=None,
    shared_backbone=True,
    actor_cfg=dict(
        type='ContinuousActor',
        head_cfg=dict(
            type='GaussianHead',
            init_log_std=-1,
            clip_return=True,
            predict_std=False),
        nn_cfg=dict(
            type='VisuomotorTransformerFrame',
            visual_nn_cfg=dict(
                type='TransformerFrame',
                num_frames='nhand + 1',
                backbone_cfg=dict(
                    type='PointNet',
                    feat_dim='pcd_all_channel',
                    mlp_spec=[64, 128, 300]),
                transformer_cfg=dict(
                    type='TransformerEncoder',
                    block_cfg=dict(
                        attention_cfg=dict(
                            type='MultiHeadSelfAttention',
                            embed_dim=300,
                            num_heads=8,
                            latent_dim=32,
                            dropout=0.1),
                        mlp_cfg=dict(
                            type='LinearMLP',
                            norm_cfg=None,
                            mlp_spec=[300, 1024, 300],
                            bias='auto',
                            inactivated_output=True,
                            linear_init_cfg=dict(
                                type='xavier_init', gain=1, bias=0)),
                        dropout=0.1),
                    mlp_cfg=None,
                    num_blocks=3),
                mask_type='skip'),
            mlp_cfg=dict(
                type='LinearMLP',
                norm_cfg=None,
                mlp_spec=['300 + agent_shape', 192, 128],
                inactivated_output=True,
                zero_init_output=True),
            is_value=False,
            mix_action=dict(
                type='LinearMLP',
                norm_cfg=None,
                mlp_spec=['300 + agent_shape', 192],
                inactivated_output=True,
                zero_init_output=True)),
        optim_cfg=dict(type='Adam', lr=0.0003)),
    critic_cfg=dict(
        type='ContinuousCritic',
        nn_cfg=dict(
            type='VisuomotorTransformerFrame',
            visual_nn_cfg=None,
            mlp_cfg=dict(
                type='LinearMLP',
                norm_cfg=None,
                mlp_spec=['300 + agent_shape', 192, 128, 1],
                inactivated_output=True,
                zero_init_output=True),
            is_value=True,
            mix_action=True),
        optim_cfg=dict(type='Adam', lr=0.0003)))
env_cfg = dict(
    type='gym',
    env_name='MoveBucket-v0',
    unwrapped=False,
    obs_mode='pointcloud',
    with_ext_torque=True,
    no_early_stop=True,
    cos_sin_representation=True,
    reward_scale=0.3,
    ego_mode=True,
    with_mask=True,
    nhand_pose=2,
    process_mode='base',
    device='cuda:0')
rollout_cfg = dict(
    type='Rollout',
    num_procs=5,
    sync=True,
    shared_memory=True,
    with_info=False)
replay_cfg = dict(
    type='ReplayMemory',
    capacity=40000,
    sampling_cfg=dict(type='OneStepTransition', with_replacement=False))
train_rl_cfg = dict(
    on_policy=True,
    warm_steps=0,
    total_steps=15000000,
    n_steps=40000,
    n_eval=15000000,
    n_checkpoint=2000000)
eval_cfg = dict(
    type='Evaluation',
    num=100,
    num_procs=1,
    use_hidden_state=False,
    start_state=None,
    save_traj=True,
    save_video=True,
    use_log=True,
    debug_print=False,
    env_cfg=dict(no_early_stop=False))
work_dir = None
resume_from = None

MoveBucket-v0-train - (run_rl.py:270) - INFO - 2022-11-01,09:18:01 - Set random seed to 3723535518
MoveBucket-v0-train - (run_rl.py:273) - INFO - 2022-11-01,09:18:01 - Build replay buffer!
MoveBucket-v0-train - (run_rl.py:283) - INFO - 2022-11-01,09:18:01 - Build rollout!
Traceback (most recent call last):
  File "tools/run_rl.py", line 380, in <module>
    main()
  File "tools/run_rl.py", line 353, in main
    run_one_process(0, 1, args, cfg)
  File "tools/run_rl.py", line 286, in run_one_process
    rollout = build_rollout(rollout_cfg)
  File "/home/lr/anaconda3/envs/frame_mining/lib/python3.8/site-packages/pyrl-1.0.0-py3.8.egg/pyrl/env/builder.py", line 15, in build_rollout
    return build_from_cfg(cfg, ROLLOUTS, default_args)
  File "/home/lr/anaconda3/envs/frame_mining/lib/python3.8/site-packages/pyrl-1.0.0-py3.8.egg/pyrl/utils/meta/registry.py", line 136, in build_from_cfg
    return obj_cls(**args)
  File "/home/lr/anaconda3/envs/frame_mining/lib/python3.8/site-packages/pyrl-1.0.0-py3.8.egg/pyrl/env/rollout.py", line 38, in __init__
    self.env = build_env(env_cfg, num_procs, single_procs=single_procs, **kwargs)
  File "/home/lr/anaconda3/envs/frame_mining/lib/python3.8/site-packages/pyrl-1.0.0-py3.8.egg/pyrl/env/env_utils.py", line 187, in build_env
    return VectorEnv(cfgs, **vec_env_kwargs)
  File "/home/lr/anaconda3/envs/frame_mining/lib/python3.8/site-packages/pyrl-1.0.0-py3.8.egg/pyrl/env/vec_env.py", line 32, in __init__
    example_env = build_single_env(env_cfgs[0])
  File "/home/lr/anaconda3/envs/frame_mining/lib/python3.8/site-packages/pyrl-1.0.0-py3.8.egg/pyrl/env/env_utils.py", line 172, in build_single_env
    return build_from_cfg(cfg, ENVS)
  File "/home/lr/anaconda3/envs/frame_mining/lib/python3.8/site-packages/pyrl-1.0.0-py3.8.egg/pyrl/utils/meta/registry.py", line 136, in build_from_cfg
    return obj_cls(**args)
  File "/home/lr/anaconda3/envs/frame_mining/lib/python3.8/site-packages/pyrl-1.0.0-py3.8.egg/pyrl/env/env_utils.py", line 99, in make_gym_env
    env_type = get_gym_env_type(env_name)
  File "/home/lr/anaconda3/envs/frame_mining/lib/python3.8/site-packages/pyrl-1.0.0-py3.8.egg/pyrl/env/env_utils.py", line 31, in get_gym_env_type
    raise ValueError("No such env")
ValueError: No such env
Exception ignored in: <function VectorEnv.__del__ at 0x7f3acfc4c160>
Traceback (most recent call last):
  File "/home/lr/anaconda3/envs/frame_mining/lib/python3.8/site-packages/pyrl-1.0.0-py3.8.egg/pyrl/env/vec_env.py", line 213, in __del__
    for worker in self.workers:
AttributeError: 'VectorEnv' object has no attribute 'workers'
xuanlinli17 commented 1 year ago

pip install -e ., not pip install e .

xuanlinli17 commented 1 year ago

Besides pip install -e . (not e .), you can also use this to check whether mani_skill is successfully installed.

import gym
import mani_skill.env

env = gym.make('OpenCabinetDoor-v0')
laolianlaile commented 1 year ago

Oh gosh... Sorry I made mistake... And I rebuild the environment, the error disappeares but comes a new one:AttributeError: torch._C._cuda_setDevice(device) I noticed that my pytorch is cpu-only, but actually I installed pytorch by conda install pytorch torchvsion torchaudio cudatoolkit=11.3 -c pytorch which is from your Readme.md line 37. And I reinstalled pytorch by conda install pytorch=1.12.1 torchvision=0.13.1 torchaudio=0.12.1 cudatoolkit=11.3 -c pytorch, but I got cpu-only pytorch again... Could u tell me what to do next? And I tried

import gym
import mani_skill.env

env = gym.make('OpenCabinetDoor-v0')

This time I got MESA-INTEL: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0 Is that okay?

xuanlinli17 commented 1 year ago

Did you have local cuda installed? Did you set export PATH=/usr/local/cuda-11.3/bin:$PATH and export LD_LIBRARY_PATH=/usr/local/cuda-11.3/lib64:$LD_LIBRARY_PATH in bashrc, or other rc's if you are using other shell? Is nvidia-smi available? What's your nvcc --version

The environment warning is ok, but I suspect your system doesn't support opengl performance support.

laolianlaile commented 1 year ago

nvcc -V and nvidia-smi

frame_mining) lr@lr:~/corl_22_frame_mining-main/pyrl$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
(frame_mining) lr@lr:~/corl_22_frame_mining-main/pyrl$ nvidia-smi
Tue Nov  1 13:58:55 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01    Driver Version: 515.65.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| 36%   35C    P8    12W / 125W |    533MiB /  6144MiB |     25%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1342      G   /usr/lib/xorg/Xorg                363MiB |
|    0   N/A  N/A      2122      G   /usr/bin/gnome-shell               24MiB |
|    0   N/A  N/A    208261      G   ...3/usr/lib/firefox/firefox      103MiB |
|    0   N/A  N/A    216668      G   ...nlogin/bin/sunloginclient        6MiB |
|    0   N/A  N/A    216715      G   ...nlogin/bin/sunloginclient       32MiB |
+-----------------------------------------------------------------------------+

And I have:

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/extras/CPUTI/lib64
export CUDA_HOME=/usr/local/cuda/bin
export PATH=$PATH:$LD_LIBRARY_PATH:$CUDA_HOME

What will happen if my system doesn't support OpenGL performance support.

xuanlinli17 commented 1 year ago

did it report anything when it's cpu only?

What about pip install pytorch==1.12.1 torchvision==0.13.1 and check import torch; torch.cuda.is_available();

laolianlaile commented 1 year ago

Thank you Simon. Now I can see MoveBucket-v0-train - (train_rl.py:196) - INFO - 2022-11-01,17:07:02 - Begin training! But after that it ended with RuntimeError: CUDA out of memory. My GPU is 1660S with 6GiB memory. I have other GPUs, such as 2070-8GiB 3060Ti-8GiB and two Tesla with 12GiB, but currently they are on use now, I can not try it again immediately on another PC. So could you tell me how much memory is enough for your code?

xuanlinli17 commented 1 year ago

You need a smaller batch size here, probably like 200

https://github.com/xuanlinli17/corl_22_frame_mining/blob/main/pyrl/configs/mfrl/ppo/maniskill/maniskill_pn.py

laolianlaile commented 1 year ago

Sorry for replying late. I edited _maniskillpn.py under /pyrl/configs/mfrl/ppo/maniskill simply by cd /pyrl/configs/mfrl/ppo/maniskil and sudo gedit maniskill_pn.py. But from batch_size = 200 to batch_size = 1, no matter how I tried I only got the same error. I guess I must did something wrong... What should I do now?

xuanlinli17 commented 1 year ago

Could you post the full logs here?

laolianlaile commented 1 year ago

Oops... I checked my logs and found batch_size = 330. That means I didn't realize that batch_size is controlled by .sh... So I edited the .sh, now it's running okay. It is just a foolish mistake I made... And for 1660S-6GiB, batch_size = 180, finally it works.

laolianlaile commented 1 year ago

After receiving

MoveBucket-v0-train - (evaluation.py:24) - INFO - 2022-11-06,19:36:38 - Num of trails: 100.00, Length: 92.46±50.46, Reward:-193.19±137.82, Success or Early Stop Rate: 0.83±0.38
NoveBucket-v0-train - (train_rl.py:352) - INFO- 2022-11-6,19:36:38- Save checkpoint at fInal step 15000000. The model will be saved at . /FM-MA-Movebucket/models/model_final.ckpt.

Here comes an error like below: IMG_20221107_111415 And I can find model_final.ckpt, 100 mp4 files, statistics.csv and trajectory.h5.
Is it a success or a failure?

xuanlinli17 commented 1 year ago

You can ignore the final error. Everything is completed successfully.