luca-medeiros / lang-segment-anything

SAM with text prompt
Apache License 2.0
1.71k stars 188 forks source link

model.predict() error #11

Closed mrJezy closed 1 year ago

mrJezy commented 1 year ago

Hi, thanks for creating this awesome tool!

I have the following error when trying to predict your car example. My env is the following:

python==3.9.16 torch==2.0.0+cu117 torchvision==0.15.1+cu117 numpy==1.24.2 opencv_python==4.7.0.72 Pillow==9.3.0 transformers==4.27.4 lightning==2.0.1

This is the error message I get. Seems like something related to Grounding Dino:


RuntimeError                              Traceback (most recent call last)
Cell In[4], line 1
----> 1 masks, boxes, phrases, logits = model.predict(image_pil, text_prompt)

File /opt/conda/envs/python39/lib/python3.9/site-packages/lang_sam/lang_sam.py:107, in LangSAM.predict(self, image_pil, text_prompt, box_threshold, text_threshold)
    106 def predict(self, image_pil, text_prompt, box_threshold=0.3, text_threshold=0.25):
--> 107     boxes, logits, phrases = self.predict_dino(image_pil, text_prompt, box_threshold, text_threshold)
    108     masks = torch.tensor([])
    109     if len(boxes) > 0:

File /opt/conda/envs/python39/lib/python3.9/site-packages/lang_sam/lang_sam.py:83, in LangSAM.predict_dino(self, image_pil, text_prompt, box_threshold, text_threshold)
     81 def predict_dino(self, image_pil, text_prompt, box_threshold, text_threshold):
     82     image_trans = transform_image(image_pil)
---> 83     boxes, logits, phrases = predict(model=self.groundingdino,
     84                                      image=image_trans,
     85                                      caption=text_prompt,
     86                                      box_threshold=box_threshold,
     87                                      text_threshold=text_threshold,
     88                                      device=self.device)
     89     W, H = image_pil.size
     90     boxes = box_ops.box_cxcywh_to_xyxy(boxes) * torch.Tensor([W, H, W, H])

File /opt/conda/envs/python39/lib/python3.9/site-packages/groundingdino/util/inference.py:66, in predict(model, image, caption, box_threshold, text_threshold, device)
     63 image = image.to(device)
     65 with torch.no_grad():
---> 66     outputs = model(image[None], captions=[caption])
     68 prediction_logits = outputs["pred_logits"].cpu().sigmoid()[0]  # prediction_logits.shape = (nq, 256)
     69 prediction_boxes = outputs["pred_boxes"].cpu()[0]  # prediction_boxes.shape = (nq, 4)

File /opt/conda/envs/python39/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/envs/python39/lib/python3.9/site-packages/groundingdino/models/GroundingDINO/groundingdino.py:289, in GroundingDINO.forward(self, samples, targets, **kw)
    287 if isinstance(samples, (list, torch.Tensor)):
    288     samples = nested_tensor_from_tensor_list(samples)
--> 289 features, poss = self.backbone(samples)
    291 srcs = []
    292 masks = []

File /opt/conda/envs/python39/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/envs/python39/lib/python3.9/site-packages/groundingdino/models/GroundingDINO/backbone/backbone.py:151, in Joiner.forward(self, tensor_list)
    150 def forward(self, tensor_list: NestedTensor):
--> 151     xs = self[0](tensor_list)
    152     out: List[NestedTensor] = []
    153     pos = []

File /opt/conda/envs/python39/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/envs/python39/lib/python3.9/site-packages/groundingdino/models/GroundingDINO/backbone/swin_transformer.py:716, in SwinTransformer.forward(self, tensor_list)
    713 x = tensor_list.tensors
    715 """Forward function."""
--> 716 x = self.patch_embed(x)
    718 Wh, Ww = x.size(2), x.size(3)
    719 if self.ape:
    720     # interpolate the position embedding to the corresponding size

File /opt/conda/envs/python39/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/envs/python39/lib/python3.9/site-packages/groundingdino/models/GroundingDINO/backbone/swin_transformer.py:491, in PatchEmbed.forward(self, x)
    488 if H % self.patch_size[0] != 0:
    489     x = F.pad(x, (0, 0, 0, self.patch_size[0] - H % self.patch_size[0]))
--> 491 x = self.proj(x)  # B C Wh Ww
    492 if self.norm is not None:
    493     Wh, Ww = x.size(2), x.size(3)

File /opt/conda/envs/python39/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, **kwargs)
   1496 # If we don't have any hooks, we want to skip the rest of the logic in
   1497 # this function, and just call forward.
   1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks
   1499         or _global_backward_pre_hooks or _global_backward_hooks
   1500         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1501     return forward_call(*args, **kwargs)
   1502 # Do not call functions when jit is used
   1503 full_backward_hooks, non_full_backward_hooks = [], []

File /opt/conda/envs/python39/lib/python3.9/site-packages/torch/nn/modules/conv.py:463, in Conv2d.forward(self, input)
    462 def forward(self, input: Tensor) -> Tensor:
--> 463     return self._conv_forward(input, self.weight, self.bias)

File /opt/conda/envs/python39/lib/python3.9/site-packages/torch/nn/modules/conv.py:459, in Conv2d._conv_forward(self, input, weight, bias)
    455 if self.padding_mode != 'zeros':
    456     return F.conv2d(F.pad(input, self._reversed_padding_repeated_twice, mode=self.padding_mode),
    457                     weight, bias, self.stride,
    458                     _pair(0), self.dilation, self.groups)
--> 459 return F.conv2d(input, weight, bias, self.stride,
    460                 self.padding, self.dilation, self.groups)

RuntimeError: GET was unable to find an engine to execute this computation```
luca-medeiros commented 1 year ago

Hey @mrJezy thanks for trying it out. After a quick check, seems you got some issues with your cuda version/pytorch. Mind checking your current cuda version using nvcc -V? Maybe do some tests to make sure your env is working as expected.

mrJezy commented 1 year ago

Thanks for the quick answer. I'm using GCP VertexAI Jupyterlab notebook and this is my CUDA compiler version:


Copyright (c) 2005-2020 NVIDIA Corporation
Built on Wed_Jul_22_19:09:09_PDT_2020
Cuda compilation tools, release 11.0, V11.0.221
Build cuda_11.0_bu.TC445_37.28845127_0
luca-medeiros commented 1 year ago

@mrJezy If you check your current cuda version:

release 11.0, V11.0.221

And your torch, torchvision versions:

torch==2.0.0+cu117 torchvision==0.15.1+cu117

Looks like you have torch for cuda 11.7 but your machine has a cuda v11.0. Re-installing torch with the right version should fix it.

mrJezy commented 1 year ago

Thanks for pointing this out. Had to downgrade my torch and torchvision version. I can confirm that it works with:

torch==1.12.1+cu113
torchvision==0.13.1+cu113