Image caption and dense caption modules all work fine here, however, the region caption module does not seem work well. I tested both edit_anything and ssa models.
For edit_anything model, it returns obviously wrong object descriptions. The following the the test image I input.
And the Region Segment module returns
a dog is walking on the floor in a room: [0, 50, 383, 165]; a person riding a skateboard down a street: [234, 49, 149, 166]; a piece of paper with a black background: [0, 0, 64, 110]; a white light switch with a black light: [312, 0, 53, 80]; the moon is seen over the city skyline: [116, 0, 56, 38];
There are clearly no dogs or skateboard in the picture.
For the ssa model, when I add --region_classify_model ssa option and change region_semantic method to use ssa, the method errors out with
│ /share/data/ripl/fjd/Image2Paragraph/models/segment_models/semantic_segment_anything_model.py:14 │
│ 7 in semantic_class_w_mask │
│ │
│ 144 │ │ │ │
│ 145 │ │ │ valid_mask_large_crop = mmcv.imcrop(valid_mask.numpy(), np.array([bbox[0], b │
│ 146 │ │ │ scale_large) │
│ ❱ 147 │ │ │ top_1_patch_large = torch.bincount(class_ids_patch_large[torch.tensor(valid_ │
│ 148 │ │ │ top_1_mask_category = mask_categories[top_1_patch_large.item()] │
│ 149 │ │ │ │
│ 150 │ │ │ ann['class_name'] = str(top_1_mask_category) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
IndexError: The shape of the mask [3, 23] at index 0 does not match the shape of the indexed tensor [23, 3] at index 0
I wonder you have a good way to use region segment methods.
First of all, thanks for the great work.
Image caption and dense caption modules all work fine here, however, the region caption module does not seem work well. I tested both
edit_anything
andssa
models.For
edit_anything
model, it returns obviously wrong object descriptions. The following the the test image I input. And the Region Segment module returnsThere are clearly no dogs or skateboard in the picture.
For the
ssa
model, when I add--region_classify_model ssa
option and changeregion_semantic
method to usessa
, the method errors out withI wonder you have a good way to use region segment methods.