shivahanifi / GazeMDETR

Apache License 2.0

0 stars 0 forks source link

Combine gaze information and the features extracted from backbone #2

Closed shivahanifi closed 10 months ago

shivahanifi commented 10 months ago

Consider a simple ratio relation between the image patches and the feature map. multiply the gaze heatmap and get the results.

The attention heatmap needs to be downsampled to the dimension of the feature map, also flattened, and then multiply

shivahanifi commented 10 months ago

Required info:

parameter	value	Notes
im	(640, 480)	original RGB PIL image
raw_hm	torch.Size([3, 1, 64, 64])	raw heatmap from VTD
norm_map	(640, 480)	modulized 3 channel heatmap from VTD/ values (0,255)
norm_map_gray	(640, 480)	modulized 3 channel heatmap/ values(0,255)
normalized_norm_map_tensor	torch.Size([1, 800, 1066])	norm_map resized and transformed to tensor
gaze	torch.Size([3, 480, 640])	normalized `norm_map` using `transforms.ToTensor`
src	torch.Size([850, 1, 256])

shivahanifi commented 10 months ago

Using the norm_map_gray and transforming it as:

transform_normMap = T.Compose([
    T.Resize(800),
    T.ToTensor(),
    T.Normalize((-1), (2))
])

and then passing it as the gaze parameter inside the plot_inference function, and mdetr.py and transformer.py modules.

transformer.py module

    gaze = torch.nn.functional.interpolate(gaze.unsqueeze(0),size=(h,w), mode='bilinear', align_corners=False).squeeze(0)
    gaze = gaze.unsqueeze(0).to(device)
    src= gaze * src