ValueError: operands could not be broadcast together with shapes (480,132) (960,1280) (960,1280)

RyuseiiSama commented 3 months ago

Hello! I chanced upon this study and was just fiddling around trying to apply it to my own objects.

Steps taken: 1) Followed your instructions to setup 2) Ran os_tog.ipynb with your sample images, succeeded 3) Reran using my own sample images, each as .jpeg and .png (not sure if its relevant), particularly this cell:

target_object = "marker"
target_task = "handover"

scene_img = cv2.imread("../samples/test2.jpeg")
scene_img = cv2.cvtColor(scene_img, cv2.COLOR_BGR2RGB)

grasps = framework.get_prediction(scene_img, target_object, target_task)

4) Got this output:

[INFO] Found object 'marker' in database.
Attempting K=10
Using KMeans - PyTorch, Cosine Similarity, No Elbow
Output centroids are normalized
used 3 iterations (0.0083s) to cluster 26 items into 10 clusters
Generating Saliency mask
Attempting K=80
Using KMeans - PyTorch, Cosine Similarity, No Elbow
Output centroids are normalized
used 9 iterations (0.022s) to cluster 832 items into 80 clusters
Attempting K=80
Using KMeans - PyTorch, Cosine Similarity, No Elbow
Output centroids are normalized
used 8 iterations (0.0058s) to cluster 393 items into 80 clusters
Starting processing of the affordances

And this error:

    "name": "ValueError",
    "message": "operands could not be broadcast together with shapes (480,132) (960,1280) (960,1280) ",
    "stack": "---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[13], line 7
      4 scene_img = cv2.imread(\"../samples/test2.jpeg\")
      5 scene_img = cv2.cvtColor(scene_img, cv2.COLOR_BGR2RGB)
----> 7 grasps = framework.get_prediction(scene_img, target_object, target_task)

File ~/Desktop/os_tog/os_tog/os_tog/framework.py:68, in OS_TOG.get_prediction(self, scene_img, target_object, target_task)
     66 if self.cfg.MULTI_REF_AFF: # align affordance to object through rotations
     67     ref_aff, ref_img = self.get_nearest_affordance(ref_aff, ref_img, scene_img, (pred_mask[obj_idx], pred_boxes[obj_idx]))
---> 68 pred_aff = self.get_affordance_recognition_predictions(ref_img, ref_aff, scene_img, (pred_mask[obj_idx], pred_boxes[obj_idx]))
     70 grasps = self.get_valid_grasps(scene_img, pred_aff)    
     71 return grasps[0] # return final grasp

File ~/Desktop/os_tog/os_tog/os_tog/framework.py:268, in OS_TOG.get_affordance_recognition_predictions(self, ref_img, ref_aff, obs_img, segm_preds)
    266 if self.cfg.VISUALIZE:
    267     visualize(np.array(ref_img), masks=np.asarray([ref_aff]), title=\"Reference Affordance\", figsize=(5,5)) # may be rotate if u chose MULTI_REF_AFF=True in cfg
--> 268     visualize(obs_img, masks=np.asarray([uncrop_mask]), title=\"Affordance Prediction\", figsize=(5,5))
    269 return uncrop_mask

File ~/Desktop/os_tog/os_tog/os_tog/utils.py:51, in visualize(image, boxes, masks, class_ids, grasps, figsize, ax, title)
     49 if masks is not None:
     50     mask = masks[i, :, :]
---> 51     masked_image = apply_mask(masked_image, mask, color)
     53 # plot grasps
     54 if grasps is not None:

File ~/Desktop/os_tog/os_tog/os_tog/utils.py:123, in apply_mask(image, mask, color, alpha)
    121 \"\"\"Apply a binary mask to an image.\"\"\"
    122 for c in range(3):
--> 123     image[:, :, c] = np.where(mask == 1,
    124                               image[:, :, c] *
    125                               (1 - alpha) + alpha * color[c] * 255,
    126                               image[:, :, c])
    127 return image

File <__array_function__ internals>:180, in where(*args, **kwargs)

ValueError: operands could not be broadcast together with shapes (480,132) (960,1280) (960,1280) "

Images that appeared:

Wondering if it was meant to run with non-sample images? if so how may I (in future) come around to implement this?

Do note that i am EXTREMELY new to anything computer vision related. That being said, please throw any technicalities that resulted in this issue!

Thanks in advance :)

Sai-Yarlagadda commented 1 month ago

You can add this line before getting grasps. scene_img = cv2.resize(scene_img, (480,132))

valerija-h commented 1 month ago

The demo scene image and real-world experiment scenes were dimensions of 640x480 pixels, I believe yours may be 1280x960. Could you try resizing it as @Sai-Yarlagadda suggested, but instead do it after converting the scene to RGB near the start of the cell and make the output size 640x480 like so;

scene_img = cv2.imread("../samples/test2.jpeg")
scene_img = cv2.cvtColor(scene_img, cv2.COLOR_BGR2RGB)
# added resizing below
scene_img = cv2.resize(scene_img, (640, 480))

If that doesn't work could you attach your test image.

valerija-h / os_tog

ValueError: operands could not be broadcast together with shapes (480,132) (960,1280) (960,1280) #1