hyperplane-lab / RLAfford

RLAfford: End-to-End Affordance Learning for Robotic Manipulation, ICRA 2023
https://sites.google.com/view/rlafford/
88 stars 8 forks source link

torch.cuda.OutOfMemoryError #5

Closed Bailey-24 closed 1 year ago

Bailey-24 commented 1 year ago

when I ran the close drawer task in the Partial's first comand. python train.py --task=OneFrankaCabinetPCPartial --task_config=cfg/franka_drawer_PC_partial_cloud_close.yaml --algo=ppo_pc_pure --algo_config=cfg/ppo_pc_pure/config.yaml --headless --rl_device=cuda:0 --sim_device=cuda:0 --seed=0

there is an error.

Traceback (most recent call last):
  File "train.py", line 69, in <module>
    train()
  File "train.py", line 56, in train
    sarl.run(num_learning_iterations=iterations, log_interval=cfg_train["learn"]["save_interval"])
  File "/home/jiahui/Desktop/RL_learning/RLAfford/MARL_Module/envs/algorithms/ppo/ppo/ppo.py", line 327, in run
    mean_value_loss, mean_surrogate_loss = self.update(it)
  File "/home/jiahui/Desktop/RL_learning/RLAfford/MARL_Module/envs/algorithms/ppo/ppo/ppo.py", line 571, in update
    loss.backward()
  File "/home/jiahui/anaconda3/envs/afford/lib/python3.8/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/home/jiahui/anaconda3/envs/afford/lib/python3.8/site-packages/torch/autograd/__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.42 GiB (GPU 2; 11.78 GiB total capacity; 8.01 GiB already allocated; 1.32 GiB free; 9.29 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Printing Profile:
End of Profile
free(): invalid pointer
Aborted

I had successfully ran the State command.

I use ubuntu18.04. image

boshi-an commented 1 year ago

12G of GPU memory isn't enough for running point-cloud based simulation with default configuration. You may reduce the number of environments by changing the command line prompt. --num_envs will do so.