Open tgrigat opened 2 months ago
Hi, thanks for trying out the code, and apologies for the delayed response. The LangSAM model shouldn't take up too much memory, could you maybe try running
nvidia-smi
and ensure that no other process is using the GPU? The command should show a list of current processes using the GPU, and you can kill any processes which you don't need for the code to run, which should free up some memory:
sudo kill -9 PID
Hope this helps, and let me know if not!
Hi, were you able to get the LangSAM model to run? Happy to look into this further if not!
I have met the same problem and my GPU is RTX-3060 (6G)(Laptop-version).
This is the GPU usage before running python main.py --robot franka
Tue Jun 11 17:16:25 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 ... Off | 00000000:01:00.0 On | N/A |
| N/A 38C P8 20W / 80W | 865MiB / 6144MiB | 1% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1170 G /usr/lib/xorg/Xorg 385MiB |
| 0 N/A N/A 1495 G /usr/bin/gnome-shell 103MiB |
| 0 N/A N/A 3110 G ...seed-version=20240607-130129.053000 253MiB |
| 0 N/A N/A 16096 G ...erProcess --variations-seed-version 91MiB |
| 0 N/A N/A 48788 G /proc/self/exe 20MiB |
+---------------------------------------------------------------------------------------+
After launching main.py
, GPU usage is as follows:
Tue Jun 11 17:20:43 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.171.04 Driver Version: 535.171.04 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3060 ... Off | 00000000:01:00.0 On | N/A |
| N/A 41C P0 33W / 80W | 4119MiB / 6144MiB | 13% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 1170 G /usr/lib/xorg/Xorg 397MiB |
| 0 N/A N/A 1495 G /usr/bin/gnome-shell 114MiB |
| 0 N/A N/A 3110 G ...seed-version=20240607-130129.053000 202MiB |
| 0 N/A N/A 16096 G ...erProcess --variations-seed-version 91MiB |
| 0 N/A N/A 48788 G /proc/self/exe 20MiB |
| 0 N/A N/A 85049 C python 2948MiB |
| 0 N/A N/A 85120 G python 328MiB |
+---------------------------------------------------------------------------------------+
After command input, the GPU memory is not enough,
The user command is "pick up a can".
assistant:
INITIAL PLANNING 1:
The task requires the robot arm to pick up a can. The gripper should interact with the can along its sides, as the can's diameter is likely to be less than the maximum graspable width of the gripper (0.08 m).
First, let's detect the can in the environment.
python
detect_object("can")
Stop generation here and wait for the printed outputs from the detect_object function call.finish_reason: stop
[INFO/MainProcess] Finished generating ChatGPT output!
[INFO/MainProcess] Capturing head and wrist camera images...
[INFO/MainProcess] Finished capturing head camera image!
[INFO/MainProcess] Segmenting head camera image...
/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/transformers/modeling_utils.py:907: FutureWarning: The `device` argument is deprecated and will be removed in v5 of Transformers.
warnings.warn(
/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/utils/checkpoint.py:429: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
warnings.warn(
/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/utils/checkpoint.py:61: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn(
[INFO/MainProcess] Generating ChatGPT output...
user:
Running code block 1 of your previous response resulted in the following error:
Traceback (most recent call last):
File "/home/ckx/workbench/language-models-trajectory-generators/main.py", line 115, in <module>
exec(code)
File "<string>", line 2, in <module>
File "/home/ckx/workbench/language-models-trajectory-generators/api.py", line 60, in detect_object
model_predictions, boxes, segmentation_texts = models.get_langsam_output(rgb_image_head, self.langsam_model, segmentation_texts, self.segmentation_count)
File "/home/ckx/workbench/language-models-trajectory-generators/models.py", line 20, in get_langsam_output
masks, boxes, phrases, logits = model.predict(image, segmentation_texts)
File "/home/ckx/3dparty/lang-segment-anything/lang_sam/lang_sam.py", line 118, in predict
boxes, logits, phrases = self.predict_dino(image_pil, text_prompt, box_threshold, text_threshold)
File "/home/ckx/3dparty/lang-segment-anything/lang_sam/lang_sam.py", line 93, in predict_dino
boxes, logits, phrases = predict(model=self.groundingdino,
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/groundingdino/util/inference.py", line 66, in predict
outputs = model(image[None], captions=[caption])
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/groundingdino/models/GroundingDINO/groundingdino.py", line 313, in forward
hs, reference, hs_enc, ref_enc, init_box_proposal = self.transformer(
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/groundingdino/models/GroundingDINO/transformer.py", line 258, in forward
memory, memory_text = self.encoder(
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/groundingdino/models/GroundingDINO/transformer.py", line 576, in forward
output = checkpoint.checkpoint(
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/_dynamo/eval_frame.py", line 328, in _fn
return fn(*args, **kwargs)
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
return fn(*args, **kwargs)
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 451, in checkpoint
return CheckpointFunction.apply(function, preserve, *args)
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/autograd/function.py", line 539, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/utils/checkpoint.py", line 230, in forward
outputs = run_function(*args)
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/groundingdino/models/GroundingDINO/transformer.py", line 785, in forward
src2 = self.self_attn(
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/groundingdino/models/GroundingDINO/ms_deform_attn.py", line 272, in forward
output = multi_scale_deformable_attn_pytorch(
File "/home/ckx/miniconda3/envs/py39/lib/python3.9/site-packages/groundingdino/models/GroundingDINO/ms_deform_attn.py", line 71, in multi_scale_deformable_attn_pytorch
(torch.stack(sampling_value_list, dim=-2).flatten(-2) * attention_weights)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 208.00 MiB. GPU 0 has a total capacty of 5.77 GiB of which 215.69 MiB is free. Including non-PyTorch memory, this process has 4.41 GiB memory in use. Of the allocated memory 3.94 GiB is allocated by PyTorch, and 334.52 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Maybe 6G is too small to run such a big system. Time to update my hardware :( May I ask how much GPU memory this script would require? I want to get some information so I can buy a suitable GPU.
same issue here!
Hi, apologies for the delay in replying, and for the persistent issue.
I will have a look into this and provide an update as soon as possible - in the meantime, the whole system can be run on the CPU, without taking too much additional time (the main bottlenecks are the vision models, which should take around one or two minutes each for inference).
Sorry about this, and hope this helps for now!
No worries, managed to get it to run exactly like you described yesterday. The bottleneck is the Langsam library, hardcoding cpu as device in lang_sam.py
does the job. The rest including XMem can stay on CUDA. This way, only the segmentation step in detect_object
takes maybe 2min, the rest is pretty fast.
Thank you for the kind reply. I managed to run it on a workstation using Docker and used -X11 to render the GUI on my local screen. It works well—big thanks to all of you.
Thanks for your work
I tried to run the code but got a memory error.
I am working with a new Nvidia Geforce RTX 4080 with more than 15 GB of Vram. Is this expected behaviour? If yes, how much VRAM is required. If no, do you know what I could change?
Thanks for your help