Closed shivahanifi closed 10 months ago
parameter | value | Notes |
---|---|---|
im | (640, 480) | original RGB PIL image |
raw_hm | torch.Size([3, 1, 64, 64]) | raw heatmap from VTD |
norm_map | (640, 480) | modulized 3 channel heatmap from VTD/ values (0,255) |
norm_map_gray | (640, 480) | modulized 3 channel heatmap/ values(0,255) |
normalized_norm_map_tensor | torch.Size([1, 800, 1066]) | norm_map resized and transformed to tensor |
gaze | torch.Size([3, 480, 640]) | normalized norm_map using transforms.ToTensor |
src | torch.Size([850, 1, 256]) |
Using the norm_map_gray
and transforming it as:
transform_normMap = T.Compose([
T.Resize(800),
T.ToTensor(),
T.Normalize((-1), (2))
])
and then passing it as the gaze
parameter inside the plot_inference
function, and mdetr.py
and transformer.py
modules.
transformer.py
module gaze = torch.nn.functional.interpolate(gaze.unsqueeze(0),size=(h,w), mode='bilinear', align_corners=False).squeeze(0)
gaze = gaze.unsqueeze(0).to(device)
src= gaze * src
Consider a simple ratio relation between the image patches and the feature map. multiply the gaze heatmap and get the results.