tgc1997 / RMN

IJCAI2020: Learning to Discretely Compose Reasoning Module Networks for Video Captioning
79 stars 12 forks source link

Spatial Feats #19

Closed arnavc1712 closed 3 years ago

arnavc1712 commented 3 years ago

Hi, I see you using the 2D CNN features (1536 dim), 3D CNN features (1024 dim), RCNN features (2048 dim). I also see something called spatial features of 5 dimensions. What are these features? I could not find them mentioned anywhere in the paper?

tgc1997 commented 3 years ago

Hi, we use the position (4 dims) and the area (1 dim) of the region as spatial feature

At2021-07-26 17:08:52,Arnav @.***:

Hi, I see you using the 2D CNN features (1536 dim), 3D CNN features (1024 dim), RCNN features (2048 dim). I also see something called spatial features of 5 dimensions. What are these features? I could not find them mentioned anywhere in the paper?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

arnavc1712 commented 3 years ago

@tgc1997, Thank you!

tgc1997 commented 3 years ago

we use the following code for normalization:

def _boxes2sfeat(boxes, im):
    S_H = im.shape[0]
    S_W = im.shape[1]
    S_A = S_W * S_H

    boxes = np.asarray(boxes)
    # calculate sfeat
    x1 = boxes[:, 0]
    y1 = boxes[:, 1]
    x2 = boxes[:, 2]
    y2 = boxes[:, 3]
    Sa = (x2 - x1) * (y2 - y1)
    sfeat = np.hstack(((x1/S_W)[:, np.newaxis],
                       (y1/S_H)[:, np.newaxis],
                       (x2/S_W)[:, np.newaxis],
                       (y2/S_H)[:, np.newaxis],
                       (Sa/S_A)[:, np.newaxis]))
    return sfeat