jingtaozhan / RepCONC

WSDM'22 Best Paper: Learning Discrete Representations via Constrained Clustering for Effective and Efficient Dense Retrieval
MIT License
118 stars 13 forks source link

Evaluating RepCONC on different datasets in a zero-shot fashion #1

Open thakur-nandan opened 2 years ago

thakur-nandan commented 2 years ago

Hi @jingtaozhan,

Thanks for releasing this super repository and interesting paper. I'm interested in evaluating the model generalization across different datasets. For example, evaluating the model on different datasets from the BEIR Benchmark (https://github.com/UKPLab/beir).

It would really help if a sample code is available to evaluate an already trained RepCONC model on a dataset from the BEIR Benchmark.

Thanks!

Kind Regards, Nandan Thakur

jingtaozhan commented 2 years ago

Hi Nandan,

Thanks for your interest in our work. I planned to update you on this repo after I released the training code :)

BEIR is a very fascinating benchmark, and it will be great to evaluate RepCONC on it. I will try testing RepCONC on one of the selected datasets (e.g., TREC-COVID) and then share the code. I will update here when the code is ready.

Best, Jingtao

thakur-nandan commented 2 years ago

Hi @jingtaozhan,

Thank you so much! I look forward to the code when it's ready!

Kind Regards, Nandan Thakur

thakur-nandan commented 2 years ago

Hi @jingtaozhan,

I understand that you will be busy with other work with higher priorities. Would it be possible for you to provide an approximate timeline for this? If possible.

Thanks!

Kind Regards, Nandan Thakur

jingtaozhan commented 2 years ago

Hi @NThakur20

I'm currently working on it and it is almost done. The repo will be updated today. RepCONC will utilize the JPQ package to perform zero-shot retrieval for BEIR. Here is how I evaluate JPQ on BEIR.

I write code by following the model examples in BEIR repo. So I think both JPQ and RepCONC can be added to the BEIR examples. What do you think?

Best, Jingtao

thakur-nandan commented 2 years ago

Awesome, thank you so much @jingtaozhan!

Yes, I believe both will be interesting to evaluate, I'm currently working on a paper where we are evaluating several memory compression strategies. I can add these examples to the BEIR examples folder as well :)

Kind Regards, Nandan Thakur

jingtaozhan commented 2 years ago

The code is released now. Happy to help if you have any other questions.

I do think it's very meaningful to include JPQ and RepCONC in BEIR examples. They follow a different paradigm, joint optimization with compact index, compared with many existing DR models. It wouldn't be hard to add them since the code is ready. I can open a PR if you think is OK.

Best, Jingtao

thakur-nandan commented 2 years ago

Thanks @jingtaozhan, for providing the scores and the script soon.

I think it will be definitely interesting to add the models to the repository. I would be happy if you can open a PR in the repository.

Thanks, Nandan Thakur

jingtaozhan commented 2 years ago

Hi @NThakur20

I update the code so that it is now very easy to apply RepCONC to different dense retrieval models (Pull Request). Thought you might be interested.

Best, Jingtao

thakur-nandan commented 2 years ago

Hi @jingtaozhan,

Thank you for the PR and apologies for the delay. Will have a look.

Meanwhile, in a recent preprint of ours, we found our work on GPL (https://aclanthology.org/2022.naacl-main.168/) useful to help improve zero-shot JPQ performance. We experimented with JPQ on all BEIR datasets (with TAS-B as backbone instead of STAR). GPL which involves cross-encoder distillation with MarginMSE loss function helped improve JPQ models across all BEIR datasets and even outperformed the original uncompressed TAS-B model. For more details, you can have a look below. Would like to get some feedback :)

Paper: https://arxiv.org/abs/2205.11498 Code: https://github.com/thakur-nandan/income

Kind Regards, Nandan Thakur