jingtaozhan / RepCONC

WSDM'22 Best Paper: Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval
MIT License
115 stars 13 forks source link

Fixing of Index Assignments #3

Closed hshreeshail closed 2 years ago

hshreeshail commented 2 years ago

The RepCONC paper mentions several times that it does not fix index assignments like JPQ. However, in section 3.6.2, there is a contradictory line as follows: "To enable end-to-end retrieval during training, we fix the Index Assignments and only train the query encoder and PQ Centroid Embeddings.". Is this line a typing error?

jingtaozhan commented 2 years ago

We jointly train the dual-encoders and PQ in two stages. In the first stage, we use RepCONC for joint optimization. The index assignments are not fixed. In the second stage, we fix the index assignments and uses JPQ to further train the query encoder and PQ centroids.

hshreeshail commented 2 years ago

Thanks. Just for clarification. 1] So, in JPQ paper, does the training happen in a single stage which uses dynamic hard negatives? And is this equivalent to RepCONC's 2nd training stage? 2] Is fixing the index assignments necessary in the second stage? If yes, why is that? Because of dynamic negative sampling?

jingtaozhan commented 2 years ago

[1] Yes, JPQ is a single-stage training method and is equivalent to RepCONC's 2nd training stage. [2] Yes, it is necessary because of dynamic hard negative sampling.

hshreeshail commented 2 years ago

Does this mean that the uniform clustering constraint is not a part of training stage-2? As much as I have understood, calculating the posterior distribution q(j | di) using sinkhorn_algorithm is used to help choose centroids more uniformly, so that the centroid distribution does not get skewed during training. But if the index assignments are fixed, then the centroid distribution is also fixed. So that would mean that q (and the sinkhorn_algorithm) are not a part of 2nd stage training.

jingtaozhan commented 2 years ago

Yes, you are right.