haosulab / ManiSkill-Learn

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a physics-rich manipulation skill benchmark with large-scale demonstrations.
Apache License 2.0
63 stars 7 forks source link

I can run the program but the output video isn't visible #2

Closed QiuJunning closed 3 years ago

QiuJunning commented 3 years ago

I installed maniskill and maniskill-learn according to readme and run the example: python -m tools.run_rl configs/bc/mani_skill_point_cloud_transformer.py \ --gpu-ids=3 --cfg-options "env_cfg.env_name=OpenCabinetDrawer_1045_link_0-v0" \ "eval_cfg.save_video=True" "eval_cfg.num=1" "eval_cfg.use_log=True" \ --work-dir=./test/OpenCabinetDrawer_1045_link_0-v0_pcd \ --resume-from=./example_mani_skill_data/OpenCabinetDrawer_1045_link_0-v0_PN_Transformer.ckpt --evaluation The program can run,but the video of test looks black:

image

Hope to get your help, thank you! The program log is as follows:

INFO - 2021-08-30 09:39:12,600 - utils - Note: detected 72 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable. INFO - 2021-08-30 09:39:12,600 - utils - Note: NumExpr detected 72 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. Size of image in the rendered video (160, 400, 3) /bin/sh: 1: /home/qjn/miniconda/envs/mani_skill/bin/nvcc: not found OpenCabinetDrawer_1045_link_0-v0 - INFO - 2021-08-30 09:39:23 - Environment info:

sys.platform: linux Python: 3.8.10 (default, Jun 4 2021, 15:09:15) [GCC 7.5.0] CUDA available: True GPU 0,1,2,5: Quadro RTX 8000 GPU 3,4: NVIDIA GeForce RTX 2080 Ti CUDA_HOME: /home/qjn/miniconda/envs/mani_skill NVCC: Num of GPUs: 6 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.8.0+cu111 PyTorch compiling details: PyTorch built with:

TorchVision: 0.9.0+cu111 OpenCV: 4.5.3 mani_skill_learn: 1.0.0

OpenCabinetDrawer_1045_link_0-v0 - INFO - 2021-08-30 09:39:23 - Config: log_level = 'INFO' stack_frame = 1 num_heads = 4 agent = dict( type='BC', batch_size=128, policy_cfg=dict( type='ContinuousPolicy', policy_head_cfg=dict(type='DeterministicHead', noise_std=1e-05), nn_cfg=dict( type='PointNetWithInstanceInfoV0', stack_frame=1, num_objs='num_objs', pcd_pn_cfg=dict( type='PointNetV0', conv_cfg=dict( type='ConvMLP', norm_cfg=None, mlp_spec=['agent_shape + pcd_xyz_rgb_channel', 256, 256], bias='auto', inactivated_output=True, conv_init_cfg=dict(type='xavier_init', gain=1, bias=0)), mlp_cfg=dict( type='LinearMLP', norm_cfg=None, mlp_spec=[256, 256, 256], bias='auto', inactivated_output=True, linear_init_cfg=dict(type='xavier_init', gain=1, bias=0)), subtract_mean_coords=True, max_mean_mix_aggregation=True), state_mlp_cfg=dict( type='LinearMLP', norm_cfg=None, mlp_spec=['agent_shape', 256, 256], bias='auto', inactivated_output=True, linear_init_cfg=dict(type='xavier_init', gain=1, bias=0)), transformer_cfg=dict( type='TransformerEncoder', block_cfg=dict( attention_cfg=dict( type='MultiHeadSelfAttention', embed_dim=256, num_heads=4, latent_dim=32, dropout=0.1), mlp_cfg=dict( type='LinearMLP', norm_cfg=None, mlp_spec=[256, 1024, 256], bias='auto', inactivated_output=True, linear_init_cfg=dict( type='xavier_init', gain=1, bias=0)), dropout=0.1), pooling_cfg=dict(embed_dim=256, num_heads=4, latent_dim=32), mlp_cfg=None, num_blocks=6), final_mlp_cfg=dict( type='LinearMLP', norm_cfg=None, mlp_spec=[256, 256, 'action_shape'], bias='auto', inactivated_output=True, linear_init_cfg=dict(type='xavier_init', gain=1, bias=0))), optim_cfg=dict(type='Adam', lr=0.0003, weight_decay=5e-06))) eval_cfg = dict( type='Evaluation', num=1, num_procs=1, use_hidden_state=False, start_state=None, save_traj=True, save_video=True, use_log=True, env_cfg=dict( type='gym', unwrapped=False, stack_frame=1, obs_mode='pointcloud', reward_type='dense', env_name='OpenCabinetDrawer_1045_link_0-v0')) train_mfrl_cfg = dict( on_policy=False, total_steps=50000, warm_steps=0, n_steps=0, n_updates=500, n_eval=50000, n_checkpoint=50000, init_replay_buffers= './example_mani_skill_data/OpenCabinetDrawer_1045_link_0-v0_pcd.h5') env_cfg = dict( type='gym', unwrapped=False, stack_frame=1, obs_mode='pointcloud', reward_type='dense', env_name='OpenCabinetDrawer_1045_link_0-v0') replay_cfg = dict(type='ReplayMemory', capacity=1000000) work_dir = './test/OpenCabinetDrawer_1045_link_0-v0_pcd/BC' resume_from = './example_mani_skill_data/OpenCabinetDrawer_1045_link_0-v0_PN_Transformer.ckpt'

OpenCabinetDrawer_1045_link_0-v0 - INFO - 2021-08-30 09:39:23 - Set random seed to None OpenCabinetDrawer_1045_link_0-v0 - INFO - 2021-08-30 09:39:24 - State shape:{'pointcloud': {'rgb': (1200, 3), 'xyz': (1200, 3), 'seg': (1200, 3)}, 'state': 38}, action shape:Box(-1.0, 1.0, (13,), float32) OpenCabinetDrawer_1045_link_0-v0 - INFO - 2021-08-30 09:39:24 - We do not use distributed training, but we support data parallel in torch OpenCabinetDrawer_1045_link_0-v0 - INFO - 2021-08-30 09:39:24 - Save trajectory at ./test/OpenCabinetDrawer_1045_link_0-v0_pcd/BC/test/trajectory.h5. OpenCabinetDrawer_1045_link_0-v0 - INFO - 2021-08-30 09:39:24 - Begin to evaluate OpenCabinetDrawer_1045_link_0-v0 - INFO - 2021-08-30 09:39:39 - Episode 0: Length 200 Reward: -2865.0203219550845 OpenCabinetDrawer_1045_link_0-v0 - INFO - 2021-08-30 09:39:40 - memory:5.53G gpu_mem_ratio:3.5% gpu_mem:1.65G gpu_mem_this:0.00G gpu_util:4% OpenCabinetDrawer_1045_link_0-v0 - INFO - 2021-08-30 09:39:40 - Num of trails: 1.00, Length: 200.00+/-0.00, Reward: -2865.02+/-0.00, Success or Early Stop Rate: 0.00

fbxiang commented 3 years ago

I would first try switch to another video player with better codec support (e.g. VLC). Since the video is generated, the code is probably fine.

lz1oceani commented 3 years ago

I have double-checked, the generated video can be opened on Ubuntu 20.04. If you still cannot open the video with VLC, you can try ffmpeg -i input.mp4 output.xxx to convert the video format to the supported ones on your computer.

QiuJunning commented 3 years ago

I would first try switch to another video player with better codec support (e.g. VLC). Since the video is generated, the code is probably fine.

Thank you for your reply. I tried to open the video with VLC, but it still didn't work

fbxiang commented 3 years ago

One other thing to try is to replace the video writer with an image writer to verify if the images themselves are not generated correctly.

QiuJunning commented 3 years ago

I have double-checked, the generated video can be opened on Ubuntu 20.04. If you still cannot open the video with VLC, you can try ffmpeg -i input.mp4 output.xxx to convert the video format to the supported ones on your computer.

Thanks for your advice, I tried VLC but it didn't work. My computer can normally open other MP4 files, and the conversion to other common formats (e.g.avi) does not work either. I want to know whether my output info of the Evaluation on Simple Pretrained Models you provided is correct,thanks!

lz1oceani commented 3 years ago

The output seems to be correct. You can use this code to check every image in the video.

import cv2
import os.path as osp
filename = "xxx.mp4"
video = cv2.VideoCapture(filename)
video_dir = osp.dirname(filename)
# success, image = video.read()
count = 0
success = True
while success:
     success, image = video.read()
     if success:
        cv2.imwrite(osp.join(video_dir, f"frame_{count}.jpg"), image)
        count += 1
print(count)
QiuJunning commented 3 years ago

The output seems to be correct. You can use this code to check every image in the video.

import cv2
import os.path as osp
filename = "xxx.mp4"
video = cv2.VideoCapture(filename)
video_dir = osp.dirname(filename)
# success, image = video.read()
count = 0
success = True
while success:
     success, image = video.read()
     if success:
        cv2.imwrite(osp.join(video_dir, f"frame_{count}.jpg"), image)
        count += 1
print(count)

Thanks for the code you provided. The result of check is that the count of all videos is 200, and each frame is a 400×160 black picture and 1.59KB. The picture is as follows: frame_145

lz1oceani commented 3 years ago

We find a bug when rendering with multiple gpus. Can you update ManiSkill repo and install a new sapien from https://ucsdcloud-my.sharepoint.com/:u:/g/personal/z6ling_ucsd_edu/EVWaOUz0Cw5MgHIY06H9PxEBcaD5cLUK1VvnhyTibMMGmQ?e=aeNWtm and rerun the scripts? You do not need to set the CUDA_VISIBLE_DEVICES=0 when running the script.

QiuJunning commented 3 years ago

We find a bug when rendering with multiple gpus. Can you update ManiSkill repo and install a new sapien from https://ucsdcloud-my.sharepoint.com/:u:/g/personal/z6ling_ucsd_edu/EVWaOUz0Cw5MgHIY06H9PxEBcaD5cLUK1VvnhyTibMMGmQ?e=aeNWtm and rerun the scripts? You do not need to set the CUDA_VISIBLE_DEVICES=0 when running the script.

I have updated ManiSkill repo and installed a new sapien.Whether I set CUDA_VISIBLE_DEVICES=0 or not,It still doesn't work.

lz1oceani commented 3 years ago

OK. I think you can try the following code to see if the env can render images correctly.

import mani_skill.env, gym
env = gym.make('OpenCabinetDrawer-v0')
x = env.render('color_image')['world']['rgb']
print(x)

Or use the following code to view the env with UI.

import mani_skill.env, gym
env = gym.make('OpenCabinetDrawer-v0')
while True:
     env.render('human')
QiuJunning commented 3 years ago

OK. I think you can try the following code to see if the env can render images correctly.

import mani_skill.env, gym
env = gym.make('OpenCabinetDrawer-v0')
x = env.render('color_image')['world']['rgb']
print(x)

Or use the following code to view the env with UI.

import mani_skill.env, gym
env = gym.make('OpenCabinetDrawer-v0')
while True:
     env.render('human')

I have run the code.The output of x is an all-zero array.So the env can't render images correctly

lz1oceani commented 3 years ago

OK. Can you open the ui and see what happens?

lz1oceani commented 3 years ago

Can you provide the version of your Nvidia driver?

QiuJunning commented 3 years ago

Can you provide the version of your Nvidia driver?

OK,my Nvidia driver is 470.57.02. Probably because I use a server,there will be an error when using env.render('human'): RuntimeError: Create window failed: context is not created with present support.

lz1oceani commented 3 years ago

Can you run python -m sapien.example.offscreen first to double check the sapien renderer? Because maniskill use cupy to get rendered image, which may cause bugs. The output image is a red box.

QiuJunning commented 3 years ago

Can you run python -m sapien.example.offscreen first to double check the sapien renderer? Because maniskill use cupy to get rendered image, which may cause bugs. The output image is a red box.

OK,the output image does not appear and my results are as follows: [2021-09-05 12:54:34.519] [svulkan2] [error] GLFW error: X11: The DISPLAY environment variable is missing [2021-09-05 12:54:34.519] [svulkan2] [warning] Continue without GLFW. [2021-09-05 12:54:35.200] [SAPIEN] [warning] Mass or inertia contains very small number, this is not allowed. Mass will be set to 1e-6 and inertia will be set to 1e-8 for stability. Actor:

lz1oceani commented 3 years ago

Is there any image like output.png under the running path?

lz1oceani commented 3 years ago

By the way, is it possible to try mani skill on another available machine?

QiuJunning commented 3 years ago

运行路径下有没有类似output.png的图片?

Oh,sorry,I have found the output picture,this is my output. output

lz1oceani commented 3 years ago

Can you run pip freeze | grep sapien to check sapien version again.

QiuJunning commented 3 years ago

Can you run pip freeze | grep sapien to check sapien version again.

this is my output: sapien @ file:///home/qjn/ManiSkill/ManiSkill-Learn/sapien-1.1.1-cp38-cp38-manylinux2014_x86_64.whl

lz1oceani commented 3 years ago

OK. I guess the problem may come from cupy and I am asking the teammates to provide a new MainSkill branch without cupy.

QiuJunning commented 3 years ago

OK. I guess the problem may come from cupy and I am asking the teammates to provide a new MainSkill branch without cupy.

Thanks!Looking forward to your branch.

lz1oceani commented 3 years ago

You can pull the current ManiSkill branch. And run codes with environment variable NO_CUPY=1. The script is like

NO_CUPY=1 python xxxxx.

Then you can check if the cupy fails to load the image.

lz1oceani commented 3 years ago

Also you can try https://github.com/haosulab/ManiSkill-Learn/blob/main/scripts/docker/build_docker.sh to build a docker to run the program.

QiuJunning commented 3 years ago

You can pull the current ManiSkill branch. And run codes with environment variable NO_CUPY=1. The script is like

NO_CUPY=1 python xxxxx.

Then you can check if the cupy fails to load the image.

Thank you very much. It worked

lz1oceani commented 3 years ago

Hi, Qiu721, thanks for reporting this issue. I will close this issue.