DSA’s TorchHub - Githubissues

BinuxLiu commented 7 months ago

Hellod, Mohwald. I want to conduct some comparative experiments with DSA, which is very important to me. I tried your sample code to reproduce it, however I ran into some difficulties. I'm going to add your method to the project for testing. (https://github.com/gmberton/VPR-methods-evaluation) Many famous methods have been added to the models folder of this project.

import torch
import torchvision.transforms as tfm

from models import utils

class DSAModel(torch.nn.Module):
    def __init__(self, device='cuda'):
        super().__init__()        
        self.device = torch.device(device if torch.cuda.is_available() else 'cpu')

        self.net = torch.hub.load('mohwald/gandtr', 'gem_vgg16_hedngan').to(self.device)
        self.state_dict = torch.load("/home/ubuntu/.cache/torch/hub/checkpoints/hedngan_embed_vgg16.pth")
        self.net.model.load_state_dict(self.state_dict['model_state'])

        self.un_normalize = utils.UnNormalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225))
        self.normalize = tfm.Normalize(mean=[0.48501960784313836, 0.4579568627450961, 0.4076039215686255],
                                       std=[0.00392156862745098, 0.00392156862745098, 0.00392156862745098])

    def forward(self, images):
        images = self.normalize(self.un_normalize(images))
        descriptors = self.net(images)
        return descriptors

I cannot load the model onto the GPU.
Are the normalization parameters I use fair for DSA?
Do you have any other comments about this code?

Thank you.

BinuxLiu commented 7 months ago

I saw that the inference speed is very slow, and I guess it is not loaded on the GPU. Now I find that the graphics card seems to be used. Can you think of what the problem is? Thank you!

mohwald commented 7 months ago

Hi @BinuxLiu,

Thanks for the questions!

I cannot load the model onto the GPU.

There was a mistake in torch hub configuration, that loaded the model on CPU by default. I am sorry for the inconvience. It is now fixed in 6c95a69, so that model is loaded on GPU by default. If you force the model to be loaded on GPU, pass device="cuda":

>>> import torch
>>> net = torch.hub.load('mohwald/gandtr', 'gem_vgg16_hedngan', device="cuda")

I saw that the inference speed is very slow, and I guess it is not loaded on the GPU.

From the previous example, you can check, if the loaded model is on GPU with:

>>> net.device
'cuda'

Then, use nvidia-smi or gpustat each second and check whether the GPU utilization is nonzero (all the time) during the inference.

Are the normalization parameters I use fair for DSA?

I am not sure, if DSA is the model you are trying to create or whether it is a different method. If it is the network you are trying to make, then the gem_vgg16_hedngan embedding network input should get batch of tensors preprocessed from images on which CLAHE was applied and then normalized with imagenet mean-std. You can get the exact transforms we use with:

>>> net.transform
Compose(
    Pil2Numpy()
    ApplyClahe(clip_limit=1.0, grid_size=8, colorspace=lab)
    ToTensor()
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], strict_shape=True)
)

Do you have any other comments about this code?

First of all, you do not need to download the weights yourself and load them with the model. If you download the model from torch hub, by default, the pretrained model weights are loaded. You can always pass pretrained=True to be sure.

To get the best performance in terms of speed, I would utilize pytorch dataloader, which can perform the transformations on parallel CPUs. So, I would go to something like this:

import torch
from PIL import Image

class DSAModel(torch.nn.Module):
    def __init__(self):
        super().__init__()        
        self.net = torch.hub.load('mohwald/gandtr', 'gem_vgg16_hedngan', pretrained=True)

    def forward(self, images):
        return self.net(images)

class CustomDataset(torch.utils.data.Dataset):
    def __init__(self, img_dir, transform):
        self.img_dir = img_dir
        self.transform = transform
        # ...

    def __getitem__(self, idx):
        # ... get img_path from idx and self.img_dir
        with open(img_path, 'rb') as f:
            image = Image.open(f).convert("RGB")
        return self.transform(image)

model = DSAModel()
dataset = CustomDataset(img_dir=img_dir, transform=model.net.transform)
dataloader = torch.utils.data.DataLoader(dataset, num_workers=..., shuffle=False)
for x in dataloader:
    y = model(x)

The most important part is to use model.net.transform on the images before the batch goes into the model. Without that, the evaluation will give you wrong numbers.

And FYI, when you load the embedding model with torch.hub.load('mohwald/gandtr', 'gem_vgg16_hedngan'), you do not get the standard pytorch module, but a SingleNetwork that has wrappers, such as whitening in this case, that is applied during the forward pass.

Let me know if that explains your issues and your evalution now runs faster 🚀

BinuxLiu commented 7 months ago

Thank you so much, it’s much faster now! By the way, I am working on a work that has a very similar idea to your work. Unfortunately, I was rejected in both ICCV23 and CVPR24. If you are interested, please check out: https://arxiv.org/pdf/2304.00276.pdf https://arxiv.org/pdf/2402.17159.pdf

(NocPlace is a concurrent work of DSA, and I had not noticed DSA before. I will add a reference to DSA later!)

If you have any suggestions please let me know. Thank you again for your help!

BinuxLiu commented 7 months ago

I still have a small question, does the Whitening file affect model evaluation?

BinuxLiu commented 7 months ago

I reproduced the results of DSA on VPR-benchmark, and the gap between them and the results of the recent SoTA method was a bit big. In addition to the different evaluation indicators, the methods of using the data sets (e.g., Tokyo 24/7) also seem to be different. At first, I was a little doubtful that I had done something wrong. But after comparing the results of DIR, I think there should be no problem with my replication experiment.

exp_name: gem-resnet101-hedngan on tokyo 247
2024-03-12 11:49:56   Testing on < #queries: 315; #database: 75984 >
100%|█████████████████████████████████████████████████████████████| 315/315 [00:06<00:00, 51.25it/s]
100%|█████████████████████████████████████████████████████████| 75984/75984 [20:00<00:00, 63.30it/s]
All: R@1: 69.5, R@5: 83.5, R@10: 87.3, R@20: 91.7
Day: R@1: 81.0, R@5: 90.5, R@10: 91.4, R@20: 95.2
Sunset: R@1: 72.4, R@5: 86.7, R@10: 92.4, R@20: 94.3
Night: R@1: 55.2, R@5: 73.3, R@10: 78.1, R@20: 85.7

And I reported the detailed results of DIR (the pretrained model which DSA used) in Tokyo 24/7 in my previous paper. (https://arxiv.org/pdf/2304.00276.pdf)

All: R@1: 74.9 
Day: R@1: 92.4
Sunset: R@1: 81.9
Night: R@1: 50.5

I guess this is the same problem I encountered, introducing night images improved night performance but degraded daytime performance. If I've made any possible mistakes, I look forward to your correction. Thank you!

mohwald commented 7 months ago

Hi @BinuxLiu,

I still have a small question, does the Whitening file affect model evaluation?

Yes. The whitening is a necessary part of the model.

I reproduced the results of DSA on VPR-benchmark, and the gap between them ... ... I guess this is the same problem I encountered, introducing night images improved night performance but degraded daytime performance. If I've made any possible mistakes, I look forward to your correction. Thank you!

I looked on the NPR paper experiment Section, where the reported method DIR is trained on Google Landmarks (GLDv1), but for all embedding models in this repo, the training data is Retrieval-SfM-120k, and thus the two models are not directly comparable. We did not provide models trained on GLDv1, because it is no longer available. However, cirtorch also provides pretrained weights on Retrieval-SfM-120k, so if you evaluate these weights (with corresponding whitening, same backbone architecture, same embed dim), then you can fairly evaluate the tradeoffs of MDIR training.

BinuxLiu commented 7 months ago

Thank you for your reply. I guess the Whitening file should have an impact, so I don't use manual loading of the model now. Yes, although Table 14 of DVG reports that DIR has similar results under the two training sets, it cannot be strictly proven that DSA produces daytime degradation. https://arxiv.org/pdf/2204.03444.pdf

However, [cirtorch](https://github.com/filipradenovic/cnnimageretrieval-pytorch) also provides pretrained weights on Retrieval-SfM-120k, so if you evaluate these weights (with corresponding whitening, same backbone architecture, same embed dim), then you can fairly evaluate the tradeoffs of MDIR training.

Yes, thanks for your advice.

mohwald / gandtr

DSA’s TorchHub #3