mahmoodlab / HIPT

Hierarchical Image Pyramid Transformer - CVPR 2022 (Oral)
Other
509 stars 89 forks source link

what is the mean and std when using the pretrined dino vit-s to extract patch feature? #70

Closed weiaicunzai closed 7 months ago

weiaicunzai commented 9 months ago

HI, thanks for your great work. I'm currently using your pretrained dino network backbone (ViT-S/16) to extract patch (256x256) features. What mean and std should I use? ImageNet mean and std mean = (0.485, 0.456, 0.406) & std = (0.229, 0.224, 0.225) ? Or mean = (0.5,0.5,0.5) & std = (0.5,0.5,0.5)?

Thanks

image

Richarizardd commented 7 months ago

Hi @weiaicunzai - you should use mean = (0.5,0.5,0.5) & std = (0.5,0.5,0.5) as defined in loading the model weights for HIPT.