Open steveazzolin opened 2 years ago
Thanks for the issue. The differences in dataset creation among papers make this indeed a little bit more challenging than I expected :)
I like your alternative solution which probably makes the most sense here. It also might not be totally necessary to replicate the results of the individual papers, but give people more of a sense on how Person re-id works with GNNs. Please let me know which parts of dataset creation you wanna see customizable and how we obtain the final graph from each image.
I was thinking to use a similar approach as presented in the first linked paper, i.e., treat the person re-id as a sort of node classification problem. More in detail:
Given each query image, create a complete graph in which each node represents the similarity (as computed by the difference of the embeddings of a CNN) between the query image and a gallery image. So we would build M graphs (one for each query image), with N nodes (where N can be the total number of gallery images or the N nearest to the query image via nearest neighbors)
Each graph is weighted, with weights being the similarity between the gallery images. From the paper:
In our formulation, G is fully-connected and E represents the set of relationships between different probe-gallery pairs, where Wij is a scalar edge weight. It represents the relation importance between node i and node j where gi and gj are the i-th and j-th gallery images. S() is a pairwise similarity estimation function, that estimates the similarity score between gi and gj. The purpose of setting Wii to 0 is to avoid self-enhancing.
For the training graphs, we could create again a graph for each train image, with as nodes the pairs of all images of the same identity (or maybe a subset to avoid that identities with many images are oversampled?) plus some distractors (better if distractors are taken from hard examples I guess, like the most similar images of different identities wrt the query image)
The CNN used to compute the similarity between images can be specified by the user. These embeddings can be computed in a pre_transform
fashion, so to make successive reading of the same dataset faster.
Sounds good. Does there exist some reference implementation of the paper you mentioned?
Unfortunately seems not :(
🚀 The feature, motivation and pitch
Hi,
I was thinking to contribute to PyG by adding the Market 1501 dataset for person re-id, which to the best of my knowledge would be the first dataset for person re-id in PyG. I worked on it for a project at the University, but without using GNNs. However, I have some doubts about it:
By looking at the leaderboard of PapersWithCode, it seems that the GNN approach is not the winning one, with just a couple of papers using GNNs (I'm not an expert on person re-id, so here I did a quick check over the papers in the leaderboard without checking all of them). For example, I found:
The way of structuring the dataset is really approach-specific, making it hard to develop a single parameterized module. For example, the first linked paper uses GNNs to model the relations among images, while the second uses GNNs just to model the relations of persons' attributes inside a single image.
Some approaches use also attribute information, which are available in an extension of Market 1501
One way to deal with that, as proposed by @rusty1s could be to develop a specific dataset named after a single paper. However, this may require implementing the paper from scratch, since not everybody releases their code :( This would be definitely interesting, but may not be feasible since I lack the computational resources to reproduce middle-to-large models.
Alternatives
An alternative solution could be to come up with a dataset structured as we decide, without following a specific paper. For example, we may structure the graph in a way similar to the first linked paper, but without all the steps presented in it. The code can still be modular, for example by allowing the user to specify a network to use to extract the initial node features, but keeping the structure of the graph fixed.
I'm happy to discuss whether this can be a helpful contribution or not. Thanks
Additional context
No response