Closed laolianlaile closed 1 year ago
The actual error is ValueError: No such env
Did you enter mani_skill
and pip install there?
Yeah, I did installed. And I tried to re-install it but here came the same error
(frame_mining) lr@lr:~/corl_22_frame_mining-main$ cd mani_skill/
(frame_mining) lr@lr:~/corl_22_frame_mining-main/mani_skill$ pip install e .
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Processing /home/lr/corl_22_frame_mining-main/mani_skill
Preparing metadata (setup.py) ... done
Collecting e
Downloading https://pypi.tuna.tsinghua.edu.cn/packages/79/19/8bdbb33a50c0a76eac690ecad9add56e1de1b08c657ac0faa862b7662be6/e-1.4.5.tar.gz (1.8 kB)
Preparing metadata (setup.py) ... done
Building wheels for collected packages: e, mani-skill
Building wheel for e (setup.py) ... done
Created wheel for e: filename=e-1.4.5-py3-none-any.whl size=2792 sha256=a41458e8336e786863a5fa2991f8820bf70e4ff3f3dd23eb92d0d62e4d1a6633
Stored in directory: /home/lr/.cache/pip/wheels/1b/9b/8f/3dfcf147f389a3f4f07e993003113c9eef9182a6e970757ff1
Building wheel for mani-skill (setup.py) ... done
Created wheel for mani-skill: filename=mani_skill-0.1.0-py3-none-any.whl size=1168 sha256=963567d76299caea6bdd6e129f22a3824e3d389e00a59fbfcc9a512705aee707
Stored in directory: /tmp/pip-ephem-wheel-cache-qpzv78_s/wheels/1a/55/fd/1510de5d3d887d0450f59ff0171fb1fe4a13dcc148ea0323dc
Successfully built e mani-skill
Installing collected packages: mani-skill, e
Attempting uninstall: mani-skill
Found existing installation: mani-skill 0.1.0
Uninstalling mani-skill-0.1.0:
Successfully uninstalled mani-skill-0.1.0
Successfully installed e-1.4.5 mani-skill-0.1.0
(frame_mining) lr@lr:~/corl_22_frame_mining-main/mani_skill$ cd ..
(frame_mining) lr@lr:~/corl_22_frame_mining-main$ cd pyrl/
(frame_mining) lr@lr:~/corl_22_frame_mining-main/pyrl$ sh ./script/ppo_FM_MA.sh
MoveBucket-v0-train - (run_rl.py:269) - INFO - 2022-11-01,09:18:01 - Config:
agent_cfg = dict(
type='PPO',
gamma=0.95,
lmbda=0.95,
critic_coeff=1,
entropy_coeff=0,
critic_clip=False,
obs_norm=False,
rew_norm=True,
adv_norm=True,
recompute_value=True,
num_epoch=2,
critic_warmup_epoch=4,
batch_size=330,
detach_actor_feature=False,
max_grad_norm=0.5,
eps_clip=0.2,
max_kl=0.2,
dual_clip=None,
shared_backbone=True,
actor_cfg=dict(
type='ContinuousActor',
head_cfg=dict(
type='GaussianHead',
init_log_std=-1,
clip_return=True,
predict_std=False),
nn_cfg=dict(
type='VisuomotorTransformerFrame',
visual_nn_cfg=dict(
type='TransformerFrame',
num_frames='nhand + 1',
backbone_cfg=dict(
type='PointNet',
feat_dim='pcd_all_channel',
mlp_spec=[64, 128, 300]),
transformer_cfg=dict(
type='TransformerEncoder',
block_cfg=dict(
attention_cfg=dict(
type='MultiHeadSelfAttention',
embed_dim=300,
num_heads=8,
latent_dim=32,
dropout=0.1),
mlp_cfg=dict(
type='LinearMLP',
norm_cfg=None,
mlp_spec=[300, 1024, 300],
bias='auto',
inactivated_output=True,
linear_init_cfg=dict(
type='xavier_init', gain=1, bias=0)),
dropout=0.1),
mlp_cfg=None,
num_blocks=3),
mask_type='skip'),
mlp_cfg=dict(
type='LinearMLP',
norm_cfg=None,
mlp_spec=['300 + agent_shape', 192, 128],
inactivated_output=True,
zero_init_output=True),
is_value=False,
mix_action=dict(
type='LinearMLP',
norm_cfg=None,
mlp_spec=['300 + agent_shape', 192],
inactivated_output=True,
zero_init_output=True)),
optim_cfg=dict(type='Adam', lr=0.0003)),
critic_cfg=dict(
type='ContinuousCritic',
nn_cfg=dict(
type='VisuomotorTransformerFrame',
visual_nn_cfg=None,
mlp_cfg=dict(
type='LinearMLP',
norm_cfg=None,
mlp_spec=['300 + agent_shape', 192, 128, 1],
inactivated_output=True,
zero_init_output=True),
is_value=True,
mix_action=True),
optim_cfg=dict(type='Adam', lr=0.0003)))
env_cfg = dict(
type='gym',
env_name='MoveBucket-v0',
unwrapped=False,
obs_mode='pointcloud',
with_ext_torque=True,
no_early_stop=True,
cos_sin_representation=True,
reward_scale=0.3,
ego_mode=True,
with_mask=True,
nhand_pose=2,
process_mode='base',
device='cuda:0')
rollout_cfg = dict(
type='Rollout',
num_procs=5,
sync=True,
shared_memory=True,
with_info=False)
replay_cfg = dict(
type='ReplayMemory',
capacity=40000,
sampling_cfg=dict(type='OneStepTransition', with_replacement=False))
train_rl_cfg = dict(
on_policy=True,
warm_steps=0,
total_steps=15000000,
n_steps=40000,
n_eval=15000000,
n_checkpoint=2000000)
eval_cfg = dict(
type='Evaluation',
num=100,
num_procs=1,
use_hidden_state=False,
start_state=None,
save_traj=True,
save_video=True,
use_log=True,
debug_print=False,
env_cfg=dict(no_early_stop=False))
work_dir = None
resume_from = None
MoveBucket-v0-train - (run_rl.py:270) - INFO - 2022-11-01,09:18:01 - Set random seed to 3723535518
MoveBucket-v0-train - (run_rl.py:273) - INFO - 2022-11-01,09:18:01 - Build replay buffer!
MoveBucket-v0-train - (run_rl.py:283) - INFO - 2022-11-01,09:18:01 - Build rollout!
Traceback (most recent call last):
File "tools/run_rl.py", line 380, in <module>
main()
File "tools/run_rl.py", line 353, in main
run_one_process(0, 1, args, cfg)
File "tools/run_rl.py", line 286, in run_one_process
rollout = build_rollout(rollout_cfg)
File "/home/lr/anaconda3/envs/frame_mining/lib/python3.8/site-packages/pyrl-1.0.0-py3.8.egg/pyrl/env/builder.py", line 15, in build_rollout
return build_from_cfg(cfg, ROLLOUTS, default_args)
File "/home/lr/anaconda3/envs/frame_mining/lib/python3.8/site-packages/pyrl-1.0.0-py3.8.egg/pyrl/utils/meta/registry.py", line 136, in build_from_cfg
return obj_cls(**args)
File "/home/lr/anaconda3/envs/frame_mining/lib/python3.8/site-packages/pyrl-1.0.0-py3.8.egg/pyrl/env/rollout.py", line 38, in __init__
self.env = build_env(env_cfg, num_procs, single_procs=single_procs, **kwargs)
File "/home/lr/anaconda3/envs/frame_mining/lib/python3.8/site-packages/pyrl-1.0.0-py3.8.egg/pyrl/env/env_utils.py", line 187, in build_env
return VectorEnv(cfgs, **vec_env_kwargs)
File "/home/lr/anaconda3/envs/frame_mining/lib/python3.8/site-packages/pyrl-1.0.0-py3.8.egg/pyrl/env/vec_env.py", line 32, in __init__
example_env = build_single_env(env_cfgs[0])
File "/home/lr/anaconda3/envs/frame_mining/lib/python3.8/site-packages/pyrl-1.0.0-py3.8.egg/pyrl/env/env_utils.py", line 172, in build_single_env
return build_from_cfg(cfg, ENVS)
File "/home/lr/anaconda3/envs/frame_mining/lib/python3.8/site-packages/pyrl-1.0.0-py3.8.egg/pyrl/utils/meta/registry.py", line 136, in build_from_cfg
return obj_cls(**args)
File "/home/lr/anaconda3/envs/frame_mining/lib/python3.8/site-packages/pyrl-1.0.0-py3.8.egg/pyrl/env/env_utils.py", line 99, in make_gym_env
env_type = get_gym_env_type(env_name)
File "/home/lr/anaconda3/envs/frame_mining/lib/python3.8/site-packages/pyrl-1.0.0-py3.8.egg/pyrl/env/env_utils.py", line 31, in get_gym_env_type
raise ValueError("No such env")
ValueError: No such env
Exception ignored in: <function VectorEnv.__del__ at 0x7f3acfc4c160>
Traceback (most recent call last):
File "/home/lr/anaconda3/envs/frame_mining/lib/python3.8/site-packages/pyrl-1.0.0-py3.8.egg/pyrl/env/vec_env.py", line 213, in __del__
for worker in self.workers:
AttributeError: 'VectorEnv' object has no attribute 'workers'
pip install -e .
, not pip install e .
Besides pip install -e .
(not e .
), you can also use this to check whether mani_skill is successfully installed.
import gym
import mani_skill.env
env = gym.make('OpenCabinetDoor-v0')
Oh gosh... Sorry I made mistake...
And I rebuild the environment, the error disappeares but comes a new one:AttributeError: torch._C._cuda_setDevice(device)
I noticed that my pytorch is cpu-only, but actually I installed pytorch by conda install pytorch torchvsion torchaudio cudatoolkit=11.3 -c pytorch
which is from your Readme.md line 37. And I reinstalled pytorch by conda install pytorch=1.12.1 torchvision=0.13.1 torchaudio=0.12.1 cudatoolkit=11.3 -c pytorch
, but I got cpu-only pytorch again... Could u tell me what to do next?
And I tried
import gym
import mani_skill.env
env = gym.make('OpenCabinetDoor-v0')
This time I got MESA-INTEL: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0
Is that okay?
Did you have local cuda installed? Did you set export PATH=/usr/local/cuda-11.3/bin:$PATH
and export LD_LIBRARY_PATH=/usr/local/cuda-11.3/lib64:$LD_LIBRARY_PATH
in bashrc, or other rc's if you are using other shell? Is nvidia-smi
available? What's your nvcc --version
The environment warning is ok, but I suspect your system doesn't support opengl performance support.
nvcc -V and nvidia-smi
frame_mining) lr@lr:~/corl_22_frame_mining-main/pyrl$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
(frame_mining) lr@lr:~/corl_22_frame_mining-main/pyrl$ nvidia-smi
Tue Nov 1 13:58:55 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 Off | N/A |
| 36% 35C P8 12W / 125W | 533MiB / 6144MiB | 25% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1342 G /usr/lib/xorg/Xorg 363MiB |
| 0 N/A N/A 2122 G /usr/bin/gnome-shell 24MiB |
| 0 N/A N/A 208261 G ...3/usr/lib/firefox/firefox 103MiB |
| 0 N/A N/A 216668 G ...nlogin/bin/sunloginclient 6MiB |
| 0 N/A N/A 216715 G ...nlogin/bin/sunloginclient 32MiB |
+-----------------------------------------------------------------------------+
And I have:
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/extras/CPUTI/lib64
export CUDA_HOME=/usr/local/cuda/bin
export PATH=$PATH:$LD_LIBRARY_PATH:$CUDA_HOME
What will happen if my system doesn't support OpenGL performance support.
did it report anything when it's cpu only?
What about pip install pytorch==1.12.1 torchvision==0.13.1
and check import torch; torch.cuda.is_available();
Thank you Simon.
Now I can see MoveBucket-v0-train - (train_rl.py:196) - INFO - 2022-11-01,17:07:02 - Begin training!
But after that it ended with RuntimeError: CUDA out of memory.
My GPU is 1660S with 6GiB memory. I have other GPUs, such as 2070-8GiB 3060Ti-8GiB and two Tesla with 12GiB, but currently they are on use now, I can not try it again immediately on another PC. So could you tell me how much memory is enough for your code?
You need a smaller batch size here, probably like 200
Sorry for replying late.
I edited _maniskillpn.py under /pyrl/configs/mfrl/ppo/maniskill simply by cd /pyrl/configs/mfrl/ppo/maniskil
and sudo gedit maniskill_pn.py
. But from batch_size = 200
to batch_size = 1
, no matter how I tried I only got the same error. I guess I must did something wrong... What should I do now?
Could you post the full logs here?
Oops... I checked my logs and found batch_size = 330
.
That means I didn't realize that batch_size
is controlled by .sh
... So I edited the .sh
, now it's running okay.
It is just a foolish mistake I made...
And for 1660S-6GiB, batch_size = 180
, finally it works.
After receiving
MoveBucket-v0-train - (evaluation.py:24) - INFO - 2022-11-06,19:36:38 - Num of trails: 100.00, Length: 92.46±50.46, Reward:-193.19±137.82, Success or Early Stop Rate: 0.83±0.38
NoveBucket-v0-train - (train_rl.py:352) - INFO- 2022-11-6,19:36:38- Save checkpoint at fInal step 15000000. The model will be saved at . /FM-MA-Movebucket/models/model_final.ckpt.
Here comes an error like below:
And I can find model_final.ckpt, 100 mp4 files, statistics.csv and trajectory.h5.
Is it a success or a failure?
You can ignore the final error. Everything is completed successfully.
After creating the required environment, I tried
sh ./script/ppo_FM_MA.sh
, but encountered with an issue below:Could you tell me what to do next? Thank u. Same AttributeError when I try TG, base and ee.