szzexpoi / POEM

Official Implementation for CVPR 2023 paper "Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning"
10 stars 0 forks source link

How do prototypes learn? #1

Open BAOOOOOM opened 5 months ago

BAOOOOOM commented 5 months ago

Thank you very much for your work, it was very interesting! But I'm curious about how prototypes learn. The article says that prototypes can be learned, represented as a linear layer in the code. However, since the initial state is equivalent to random, should we let it learn gradually? Why can we get object representations like Table 4 by learning in this way?

szzexpoi commented 5 months ago

Thanks for your interest in our study!

As described in the paper, our method consists of two stages of training, i.e., learning the prototypes and leveraging the prototypes (fixed) for visual reasoning. In the first stage, the prototypes (represented as weights in the linear layers) are first randomly initialized and then trained on a multi-label object classification task, where the predictions are computed based on an adaptive composition of prototypes. In this way, we can learn prototypes representing various objects. For more details, please refer to the "proto_learning" folder.

In terms of results in Table 4, they are computed by (1) Converting the object features for different bounding boxes in the GQA dataset (where each bounding box is associated with an object label) into the probabilistic distribution of prototypes (which is the normalized dot product between object features and all prototypes). We average the distribution for all instances of each object. (2) Applying a (unsupervised) clustering algorithm (we simply used K-means) on the probabilistic distribution. (3) Investigating the characteristics of objects in different groups.

kikiclare commented 5 months ago

I am very interested in your research. While going through the code, I encountered some issues. Firstly, in the prototype decomposition section, is there no design for NLP prototype decomposition? Another question is in proto_learning/dataloader.py: self.obj2idx = json.load(open(os.path.join(data_dir, 'obj2idx_gqa.json'))). I couldn't find the code that generates the 'obj2idx_gqa.json' file.

szzexpoi commented 5 months ago

I am very interested in your research. While going through the code, I encountered some issues. Firstly, in the prototype decomposition section, is there no design for NLP prototype decomposition? Another question is in proto_learning/dataloader.py: self.obj2idx = json.load(open(os.path.join(data_dir, 'obj2idx_gqa.json'))). I couldn't find the code that generates the 'obj2idx_gqa.json' file.

You are right that there is no prototype on language modality, as the study focuses on learning compositional visual representation. The "obj2idx_gqa" file can be downloaded from the preprocessed annotation link. These annotations are essentially derived from the GQA raw annotations to formulate the multi-label object classification task (i.e., simultaneously predicting all objects within an image), for instance, "obj2idx_gqa" is a dictionary mapping the object labels to indices.