OutOfMemoryError on new GPU with 15GB

tgrigat commented 2 months ago

Thanks for your work

I tried to run the code but got a memory error.

I am working with a new Nvidia Geforce RTX 4080 with more than 15 GB of Vram. Is this expected behaviour? If yes, how much VRAM is required. If no, do you know what I could change?

Thanks for your help

The user command is "pick up bin".

assistant:
INITIAL PLANNING 1:

The task requires the robot arm to pick up a bin. The gripper should interact with the top part of the bin, as it is usually the most accessible part and allows for a secure grip. 

Let's start by detecting the bin in the environment.

``python
detect_object("bin")
``

After executing this code, we will receive the position, orientation, and dimensions of the bin. We will use this information to plan the trajectory of the robot arm.finish_reason: stop
[INFO/MainProcess] Finished generating ChatGPT output!
[INFO/MainProcess] Capturing head and wrist camera images...
[INFO/MainProcess] Finished capturing head camera image!
[INFO/MainProcess] Segmenting head camera image...
/home/aidara/anaconda3/envs/llm_zero_shot2/lib/python3.9/site-packages/transformers/modeling_utils.py:907: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
/home/aidara/anaconda3/envs/llm_zero_shot2/lib/python3.9/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
/home/aidara/anaconda3/envs/llm_zero_shot2/lib/python3.9/site-packages/torch/utils/checkpoint.py:90: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn(
[INFO/MainProcess] Generating ChatGPT output...
user:
Running code block 1 of your previous response resulted in the following error:
Traceback (most recent call last):
  File "/home/aidara/test/language-models-trajectory-generators/main.py", line 138, in <module>
    exec(code)
  File "<string>", line 2, in <module>
  File "/home/aidara/test/language-models-trajectory-generators/api.py", line 60, in detect_object
    model_predictions, boxes, segmentation_texts = models.get_langsam_output(rgb_image_head, self.langsam_model, segmentation_texts, self.segmentation_count)
  File "/home/aidara/test/language-models-trajectory-generators/models.py", line 20, in get_langsam_output
    masks, boxes, phrases, logits = model.predict(image, segmentation_texts)
  File "/home/aidara/test/lang-segment-anything/lang_sam/lang_sam.py", line 119, in predict
    masks = self.predict_sam(image_pil, boxes)
  File "/home/aidara/test/lang-segment-anything/lang_sam/lang_sam.py", line 105, in predict_sam
    self.sam.set_image(image_array)
  File "/home/aidara/anaconda3/envs/llm_zero_shot2/lib/python3.9/site-packages/segment_anything/predictor.py", line 60, in set_image
    self.set_torch_image(input_image_torch, image.shape[:2])
  File "/home/aidara/anaconda3/envs/llm_zero_shot2/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/aidara/anaconda3/envs/llm_zero_shot2/lib/python3.9/site-packages/segment_anything/predictor.py", line 89, in set_torch_image
    self.features = self.model.image_encoder(input_image)
  File "/home/aidara/anaconda3/envs/llm_zero_shot2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/aidara/anaconda3/envs/llm_zero_shot2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/aidara/anaconda3/envs/llm_zero_shot2/lib/python3.9/site-packages/segment_anything/modeling/image_encoder.py", line 112, in forward
    x = blk(x)
  File "/home/aidara/anaconda3/envs/llm_zero_shot2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/aidara/anaconda3/envs/llm_zero_shot2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/aidara/anaconda3/envs/llm_zero_shot2/lib/python3.9/site-packages/segment_anything/modeling/image_encoder.py", line 174, in forward
    x = self.attn(x)
  File "/home/aidara/anaconda3/envs/llm_zero_shot2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/aidara/anaconda3/envs/llm_zero_shot2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/aidara/anaconda3/envs/llm_zero_shot2/lib/python3.9/site-packages/segment_anything/modeling/image_encoder.py", line 231, in forward
    attn = (q * self.scale) @ k.transpose(-2, -1)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1024.00 MiB. GPU 0 has a total capacity of 15.67 GiB of which 714.69 MiB is free. Process 291321 has 7.75 GiB memory in use. Including non-PyTorch memory, this process has 4.11 GiB memory in use. Of the allocated memory 3.70 GiB is allocated by PyTorch, and 121.07 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

kwonathan commented 2 months ago

Hi, thanks for trying out the code, and apologies for the delayed response. The LangSAM model shouldn't take up too much memory, could you maybe try running

nvidia-smi

and ensure that no other process is using the GPU? The command should show a list of current processes using the GPU, and you can kill any processes which you don't need for the code to run, which should free up some memory:

sudo kill -9 PID

Hope this helps, and let me know if not!

kwonathan commented 2 months ago

Hi, were you able to get the LangSAM model to run? Happy to look into this further if not!

cckaixin commented 3 weeks ago

I have met the same problem and my GPU is RTX-3060 (6G)(Laptop-version). This is the GPU usage before running python main.py --robot franka

Tue Jun 11 17:16:25 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060 ...    Off | 00000000:01:00.0  On |                  N/A |
| N/A   38C    P8              20W /  80W |    865MiB /  6144MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1170      G   /usr/lib/xorg/Xorg                          385MiB |
|    0   N/A  N/A      1495      G   /usr/bin/gnome-shell                        103MiB |
|    0   N/A  N/A      3110      G   ...seed-version=20240607-130129.053000      253MiB |
|    0   N/A  N/A     16096      G   ...erProcess --variations-seed-version       91MiB |
|    0   N/A  N/A     48788      G   /proc/self/exe                               20MiB |
+---------------------------------------------------------------------------------------+

After launching main.py, GPU usage is as follows:

Tue Jun 11 17:20:43 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04             Driver Version: 535.171.04   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3060 ...    Off | 00000000:01:00.0  On |                  N/A |
| N/A   41C    P0              33W /  80W |   4119MiB /  6144MiB |     13%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1170      G   /usr/lib/xorg/Xorg                          397MiB |
|    0   N/A  N/A      1495      G   /usr/bin/gnome-shell                        114MiB |
|    0   N/A  N/A      3110      G   ...seed-version=20240607-130129.053000      202MiB |
|    0   N/A  N/A     16096      G   ...erProcess --variations-seed-version       91MiB |
|    0   N/A  N/A     48788      G   /proc/self/exe                               20MiB |
|    0   N/A  N/A     85049      C   python                                     2948MiB |
|    0   N/A  N/A     85120      G   python                                      328MiB |
+---------------------------------------------------------------------------------------+

After command input, the GPU memory is not enough,

The user command is "pick up a can".

assistant:
INITIAL PLANNING 1:

The task requires the robot arm to pick up a can. The gripper should interact with the can along its sides, as the can's diameter is likely to be less than the maximum graspable width of the gripper (0.08 m).

First, let's detect the can in the environment.

python
detect_object("can")

Stop generation here and wait for the printed outputs from the detect_object function call.finish_reason: stop
[INFO/MainProcess] Finished generating ChatGPT output!
[INFO/MainProcess] Capturing head and wrist camera images...
[INFO/MainProcess] Finished capturing head camera image!
[INFO/MainProcess] Segmenting head camera image...
/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/transformers/modeling_utils.py:907: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
  warnings.warn(
/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn(
[INFO/MainProcess] Generating ChatGPT output...
user:
Running code block 1 of your previous response resulted in the following error:
Traceback (most recent call last):
  File "/home/ckx/workbench/language-models-trajectory-generators/main.py", line 115, in <module>
    exec(code)
  File "<string>", line 2, in <module>
  File "/home/ckx/workbench/language-models-trajectory-generators/api.py", line 60, in detect_object
    model_predictions, boxes, segmentation_texts = models.get_langsam_output(rgb_image_head, self.langsam_model, segmentation_texts, self.segmentation_count)
  File "/home/ckx/workbench/language-models-trajectory-generators/models.py", line 20, in get_langsam_output
    masks, boxes, phrases, logits = model.predict(image, segmentation_texts)
  File "/home/ckx/3dparty/lang-segment-anything/lang_sam/lang_sam.py", line 118, in predict
    boxes, logits, phrases = self.predict_dino(image_pil, text_prompt, box_threshold, text_threshold)
  File "/home/ckx/3dparty/lang-segment-anything/lang_sam/lang_sam.py", line 93, in predict_dino
    boxes, logits, phrases = predict(model=self.groundingdino,
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/groundingdino/util/inference.py", line 66, in predict
    outputs = model(image[None], captions=[caption])
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/groundingdino/models/GroundingDINO/groundingdino.py", line 313, in forward
    hs, reference, hs_enc, ref_enc, init_box_proposal = self.transformer(
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/groundingdino/models/GroundingDINO/transformer.py", line 258, in forward
    memory, memory_text = self.encoder(
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/groundingdino/models/GroundingDINO/transformer.py", line 576, in forward
    output = checkpoint.checkpoint(
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
    return fn(*args, **kwargs)
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 451, in checkpoint
    return CheckpointFunction.apply(function, preserve, *args)
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/autograd/function.py", line 539, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 230, in forward
    outputs = run_function(*args)
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/groundingdino/models/GroundingDINO/transformer.py", line 785, in forward
    src2 = self.self_attn(
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/groundingdino/models/GroundingDINO/ms_deform_attn.py", line 272, in forward
    output = multi_scale_deformable_attn_pytorch(
  File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/groundingdino/models/GroundingDINO/ms_deform_attn.py", line 71, in multi_scale_deformable_attn_pytorch
    (torch.stack(sampling_value_list, dim=-2).flatten(-2) * attention_weights)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 208.00 MiB. GPU 0 has a total capacty of 5.77 GiB of which 215.69 MiB is free. Including non-PyTorch memory, this process has 4.41 GiB memory in use. Of the allocated memory 3.94 GiB is allocated by PyTorch, and 334.52 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Maybe 6G is too small to run such a big system. Time to update my hardware :( May I ask how much GPU memory this script would require? I want to get some information so I can buy a suitable GPU.

johanndiep commented 3 weeks ago

same issue here!

kwonathan commented 2 weeks ago

Hi, apologies for the delay in replying, and for the persistent issue.

I will have a look into this and provide an update as soon as possible - in the meantime, the whole system can be run on the CPU, without taking too much additional time (the main bottlenecks are the vision models, which should take around one or two minutes each for inference).

Sorry about this, and hope this helps for now!

johanndiep commented 2 weeks ago

No worries, managed to get it to run exactly like you described yesterday. The bottleneck is the Langsam library, hardcoding cpu as device in lang_sam.py does the job. The rest including XMem can stay on CUDA. This way, only the segmentation step in detect_object takes maybe 2min, the rest is pretty fast.

cckaixin commented 2 weeks ago

Thank you for the kind reply. I managed to run it on a workstation using Docker and used -X11 to render the GUI on my local screen. It works well—big thanks to all of you.

kwonathan / language-models-trajectory-generators

OutOfMemoryError on new GPU with 15GB #3