mlzxy / devit

MIT License
330 stars 45 forks source link

Building background prototypes #21

Open xavibou opened 11 months ago

xavibou commented 11 months ago

Hi, Nice work! I am trying to run the model on custom data, and after building a set of prototypes, I have a large amount of false positives. I am trying to update the background prototypes to best suit the background of my images, but FP still seem to be an issue. I therefore have a couple questions:

1 - For the background classes, I adapted the code from demo/build_prototypes.ipynb and created a dictionary with N classes and a prototype consisting of a torch.Tensor of shape (N, D), where D is the dimensionality of the vectors. Then, I assign the path to the saved prototype .pth file to the model_path attribute of the main method in demo.py. Am I doing anything wrong?

2 - Do you have any insights on how to handle false positives? I am using bounding boxes for my prototypes. Did you experience a big difference in performance when using segmentation masks instead of bounding boxes?

Thanks in advance!

mlzxy commented 11 months ago

Hi @xavibou , for the background prototypes, you could check this file extract_instance_prototypes.py.

What I did is to cluster the stuff class feature tokens inside one image, and then cluster again over all tokens from all images. I don't have ablation study on this but it shall be better than averaging all background feature tokens.

For FP, my suggestions are listed below:

  1. Reduce K to an extreme value, e.g., 1. It is possible that the classification branch does well on COCO-like scene but not on customized data. So setting K=1 will rely solely on raw DINO features, which could be more robust.
  2. Apply harsher confidence score filtering or NMS.
  3. Retrain / finetune the region proposal network if possible, this would be much cheaper than retrain the entire model.
xavibou commented 11 months ago

Thanks for your quick reply! I have played around with the parameters to reduce FP and they seem to give reasonable outputs. However, I believe I am confused with regards to the background prototypes. In demo.py, the background prototypes seem to be provided within the model pre-trained weights. Therefore, given a dictionary of background prototypes with N classes:

bg_prototypes = { 'prototypes': torch tensor of shape [N, D], # D = 1024 'label_names': ['class_1', 'class_2', ..., 'class_N'] }

Is there a way to modify the the provided pre-trained weights in the demo (i.e. 'vitl_0069999.pth') by the ones in bg_prototypes? I would like to avoid re-training the entire method and, as you mention, I would then be able to only re-train the Region Proposal Network.

mlzxy commented 11 months ago

Background prototypes are extracted and fixed as part of the network weights regardless foreground classes. The rationale is background usually has less variety and is more consistent across scenes. Therefore, the bad news is there is no easy way to change backgrounds. I would suggest two possible approaches:

  1. (warranted but costly) Retrain the entire network with your new background prototypes.

  2. (cheap but may not work) Steps below:

    1. Build your background prototypes $B_{new} \in \mathbb{R}^{n\times d}$, where $n$ is the number of prototypes, $d$ is feature dimension
    2. Let the existing background prototypes be $B{old} \in \mathbb{R}^{m\times d}$, then compute an optimal transport between $B{new}$ and $B{old}$, and apply some momentum updates from $B{new}$ to $B{old}$, e.g., $B{old} = (1 - \beta) B{old} + \beta \gamma B{new}$, where the $\gamma$ is the optimal transport matching results.

The optimal transport and incremental update part of the second suggestion can be found at paper appendix A.3 clustering paragraph. I have a sinkhorn knopp implementation at run_sinkhorn_cluster.py, but I would suggest you find an easier implementation, like from https://gist.github.com/wohlert/8589045ab544082560cc5f8915cc90bd

Note that the $B_{old}$ in pytorch state dict may be a 3-dimensional tensor. If so, just treat the first two dimension as one.

YELKHATTABI commented 10 months ago

Interesting subject, I am trying to solve the same problem, @xavibou how did you experiment go? What would you suggest as a solution?

Kay545 commented 9 months ago

@mlzxy hello I want to use my own data set to build prototypes. When running build_prototypes.ipynb, the following error will appear. Do I have to use an A100 graphics card?

Snipaste_2023-12-22_21-04-01
mlzxy commented 9 months ago

The error is thrown by xformers (https://github.com/facebookresearch/xformers), an attention acceleration and memory saving library used by DINOv2. I don't know why they require A100. It shall work on other GPUs too. I suggest you first uninstall xformers by pip uninstall xformers then build prototypes.

When you really need acceleration and memory saving, install xformers from source (which is quite easy). I use v0.18 version of xformers. The pypi version may only tie to certain GPUs.

@mlzxy hello I want to use my own data set to build prototypes. When running build_prototypes.ipynb, the following error will appear. Do I have to use an A100 graphics card?

Snipaste_2023-12-22_21-04-01
elE0710 commented 5 months ago

@Kay545 Hey did you ever get the build_prototypes.ipynb running on your own data, i am currently facing the exact same issue and cannot rap my head around the issue. Since the ycb-examples run totally fine without any errors from xformers.

xavibou commented 5 months ago

Hi, I recently published the code to an approach inspired by De-ViT where the prototypes are extracted in a very similar manner on other sets of data. Feel free to check it out: https://github.com/xavibou/ovdsat

userzhi commented 1 month ago

hello,where can i get dior data used in this repo