Transform the extracted features to fit [N, 2] `input_dim`

jakubmicorek / MULDE-Multiscale-Log-Density-Estimation-via-Denoising-Score-Matching-for-Video-Anomaly-Detection

16 stars 0 forks source link

Transform the extracted features to fit [N, 2] `input_dim` #3

Closed jpainam closed 1 month ago

jpainam commented 1 month ago

Hi.

can you clarify how you use the extracted feature from HIERA_L as input to your network. your network takes as input (N, 2)

When I run hiera_L, the intermediates features are of shape torch.Size([Batch_size, 8, 56, 56, 144]) or the last layer [4, 8, 7, 7, 1152] . Are you averaging over the dim (2 ,3) ? The input to hiera_L was (B, 3, 16, 224, 224)

Thanks.

jakubmicorek commented 1 month ago

Hi,

I take the feature vector just before the classification head. This can be done as follows:

import hiera

def load_model(model_name, device):
    checkpoint = "mae_k400_ft_k400"  # finetuned on k400
    if model_name == "hiera_base_16x224":
        hiera_model = hiera.hiera_base_16x224  # 909MB
    if model_name == "hiera_large_16x224":
        hiera_model = hiera.hiera_large_16x224  # 2.72 GB
    if model_name == "hiera_huge_16x224":
        hiera_model = hiera.hiera_huge_16x224  # 7.9 GB

    model_backbone = hiera_model(pretrained=True, checkpoint=checkpoint).to(device)
    model_backbone.head = nn.Identity()
    model_backbone.eval()
    return model_backbone

For 16 consecutive frames, this will return a feature vector of dimensionality 1152 when using Hiera-Large.

jpainam commented 1 month ago

I was unable to fit an hiera_L into my 32GB GPU. Do you mind sharing your extracted features? I'm running into out_of_memory. And when i decrease the batch size, i'm running into Bus Error.

jpainam commented 1 month ago

Needed to detach the feature vectors

features = []
out = model(x)
features.append(out.detach().cpu().numpy())

Thanks.