hyperplane-lab / RLAfford

RLAfford: End-to-End Affordance Learning for Robotic Manipulation, ICRA 2023
https://sites.google.com/view/rlafford/
88 stars 8 forks source link

segmentation fault #6

Closed Bailey-24 closed 1 year ago

Bailey-24 commented 1 year ago

when I ran the close door's Partial the first command.·python train.py --task=OneFrankaCabinetPCPartial --task_config=cfg/franka_cabinet_PC_partial_cloud_close.yaml --algo=ppo_pc_pure --algo_config=cfg/ppo_pc_pure/config.yaml --headless --rl_device=cuda:0 --sim_device=cuda:0 --seed=0 it aborted. image

bacase I think ran the python train.py --task=OneFrankaCabinet --task_config=cfg/franka_cabinet_state_close.yaml --algo=ppo --algo_config=cfg/ppo/config.yaml --rl_device=cuda:0 --sim_device=cuda:0 --pipeline=cpu --seed=0. it segmentation fault. image

how to solve?

boshi-an commented 1 year ago

Maybe it is due to out-of memory. You may check whether your GPU is capable of running default numbers of environments. You can also reduce the number of environments by changing the command line prompt. --num_envs=xxx will do so.

Xingyu-Lin commented 1 year ago

Hi,

Thanks for the great code.

I met the same segmentation issue after loading around 21 assets. As such, maybe this is not related to the number of envs?

I was running on a GPU with 24G memory.

While setting num_objs and num_envs to around 16 would solve my issue, I cannot run RL training in this way, as the cabinets have >40 training assets.

A related issue is that loading asset seems relatively slow, taking around 5 seconds for each asset.

I wonder if you have any insights into this? @GengYiran @boshi-an

boshi-an commented 1 year ago

It seems to me that 24GB of GPU memory is enough to run the simulation alone. Perhaps you need to let simulator and networks work on different GPUs if you have multiple GPUs. Isaac gym currently do not support changing objects after launch, so you probably need 2 GPUs. We have tested our pipeline on 4*3090 cluster.

boshi-an commented 1 year ago

If you are running the experiment code of RLAfford, you may adjust the —rl_device, —cp_device and —sim_device to assign different GPUs for different models. Isaac gym cannot currently run on multiple GPUs as NVIDIA didn’t provide such an API.On May 16, 2023, at 3:26 AM, Xingyu Lin @.***> wrote: I am not sure how to run Isaac gym on multiple GPUs. I am also not running any RL networks right now. Is there a simple change to the code that allows multi-gpu training?

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>