microsoft / GLIP

Grounded Language-Image Pre-training
MIT License
2.2k stars 194 forks source link

Extract region features #12

Closed MarcusNerva closed 2 years ago

MarcusNerva commented 2 years ago

Hi there! Thank you for your amazing work! But how to extract features for regions in a given image?

liunian-harold-li commented 2 years ago

Hi there, I think for single-state detectors, it is not that straightforward to extract region features, unlike the two-stage detectors as in BUTD. Perhaps my coauthors have better answers @pzzhang @Haotian-Zhang.

Haotian-Zhang commented 2 years ago

Hi there! Thank you for your amazing work! But how to extract features for regions in a given image?

Hi, just as my coauthor @liunian-harold-li says it is not that straightforward to extract region features like two-stage detectors. In fact, we are able to extract the positive anchor features from ATSS. If you need to extract the backbone features, here's a piece of code that may help.

def convert_to_roi_format(boxes):
    concat_boxes = boxes.bbox
    device, dtype = concat_boxes.device, concat_boxes.dtype
    ids = torch.full((len(boxes), 1), 0, dtype=dtype, device=device)
    rois = torch.cat([ids, concat_boxes], dim=1)
    return rois

rois = convert_to_roi_format(cat_boxlist(anchors[i])[positive_indices[i]])
roi_feature = pooler(img_emb_feats[i].unsqueeze(0), rois)
roi_feature = roi_feature.squeeze(-1).squeeze(-1)

We are about to release a newer GLIPv2 that may contain more helpful code pieces for your needs. Code and Model are under internal review and will release soon. Please stay tuned.