shivahanifi / GazeMDETR

Apache License 2.0
0 stars 0 forks source link

Combine gaze information and the features extracted from backbone #2

Closed shivahanifi closed 10 months ago

shivahanifi commented 10 months ago

Consider a simple ratio relation between the image patches and the feature map. multiply the gaze heatmap and get the results.

shivahanifi commented 10 months ago

Required info:

parameter value Notes
im (640, 480) original RGB PIL image
raw_hm torch.Size([3, 1, 64, 64]) raw heatmap from VTD
norm_map (640, 480) modulized 3 channel heatmap from VTD/ values (0,255)
norm_map_gray (640, 480) modulized 3 channel heatmap/ values(0,255)
normalized_norm_map_tensor torch.Size([1, 800, 1066]) norm_map resized and transformed to tensor
gaze torch.Size([3, 480, 640]) normalized norm_map using transforms.ToTensor
src torch.Size([850, 1, 256])
shivahanifi commented 10 months ago

Using the norm_map_gray and transforming it as:

transform_normMap = T.Compose([
    T.Resize(800),
    T.ToTensor(),
    T.Normalize((-1), (2))
])

and then passing it as the gaze parameter inside the plot_inference function, and mdetr.py and transformer.py modules.

    gaze = torch.nn.functional.interpolate(gaze.unsqueeze(0),size=(h,w), mode='bilinear', align_corners=False).squeeze(0)
    gaze = gaze.unsqueeze(0).to(device)
    src= gaze * src