Open xavibou opened 11 months ago
Hi @xavibou , for the background prototypes, you could check this file extract_instance_prototypes.py.
What I did is to cluster the stuff class feature tokens inside one image, and then cluster again over all tokens from all images. I don't have ablation study on this but it shall be better than averaging all background feature tokens.
For FP, my suggestions are listed below:
K
to an extreme value, e.g., 1
. It is possible that the classification branch does well on COCO-like scene but not on customized data. So setting K=1
will rely solely on raw DINO features, which could be more robust. Thanks for your quick reply! I have played around with the parameters to reduce FP and they seem to give reasonable outputs. However, I believe I am confused with regards to the background prototypes. In demo.py, the background prototypes seem to be provided within the model pre-trained weights. Therefore, given a dictionary of background prototypes with N classes:
bg_prototypes = { 'prototypes': torch tensor of shape [N, D], # D = 1024 'label_names': ['class_1', 'class_2', ..., 'class_N'] }
Is there a way to modify the the provided pre-trained weights in the demo (i.e. 'vitl_0069999.pth') by the ones in bg_prototypes? I would like to avoid re-training the entire method and, as you mention, I would then be able to only re-train the Region Proposal Network.
Background prototypes are extracted and fixed as part of the network weights regardless foreground classes. The rationale is background usually has less variety and is more consistent across scenes. Therefore, the bad news is there is no easy way to change backgrounds. I would suggest two possible approaches:
(warranted but costly) Retrain the entire network with your new background prototypes.
(cheap but may not work) Steps below:
The optimal transport and incremental update part of the second suggestion can be found at paper appendix A.3 clustering paragraph. I have a sinkhorn knopp implementation at run_sinkhorn_cluster.py, but I would suggest you find an easier implementation, like from https://gist.github.com/wohlert/8589045ab544082560cc5f8915cc90bd
Note that the $B_{old}$ in pytorch state dict may be a 3-dimensional tensor. If so, just treat the first two dimension as one.
Interesting subject, I am trying to solve the same problem, @xavibou how did you experiment go? What would you suggest as a solution?
@mlzxy hello I want to use my own data set to build prototypes. When running build_prototypes.ipynb, the following error will appear. Do I have to use an A100 graphics card?
The error is thrown by xformers (https://github.com/facebookresearch/xformers), an attention acceleration and memory saving library used by DINOv2. I don't know why they require A100. It shall work on other GPUs too. I suggest you first uninstall xformers by pip uninstall xformers
then build prototypes.
When you really need acceleration and memory saving, install xformers from source (which is quite easy). I use v0.18 version of xformers. The pypi version may only tie to certain GPUs.
@mlzxy hello I want to use my own data set to build prototypes. When running build_prototypes.ipynb, the following error will appear. Do I have to use an A100 graphics card?
@Kay545 Hey did you ever get the build_prototypes.ipynb running on your own data, i am currently facing the exact same issue and cannot rap my head around the issue. Since the ycb-examples run totally fine without any errors from xformers.
Hi, I recently published the code to an approach inspired by De-ViT where the prototypes are extracted in a very similar manner on other sets of data. Feel free to check it out: https://github.com/xavibou/ovdsat
hello,where can i get dior data used in this repo
Hi, Nice work! I am trying to run the model on custom data, and after building a set of prototypes, I have a large amount of false positives. I am trying to update the background prototypes to best suit the background of my images, but FP still seem to be an issue. I therefore have a couple questions:
1 - For the background classes, I adapted the code from demo/build_prototypes.ipynb and created a dictionary with N classes and a prototype consisting of a torch.Tensor of shape (N, D), where D is the dimensionality of the vectors. Then, I assign the path to the saved prototype .pth file to the model_path attribute of the main method in demo.py. Am I doing anything wrong?
2 - Do you have any insights on how to handle false positives? I am using bounding boxes for my prototypes. Did you experience a big difference in performance when using segmentation masks instead of bounding boxes?
Thanks in advance!