kongds / scaling_sentemb

Scaling Sentence Embeddings with Large Language Models
85 stars 4 forks source link

add mteb script #13

Closed WuNein closed 1 month ago

WuNein commented 3 months ago

add mteb script for better eval

kongds commented 2 months ago

Thanks for your interesting and contribution to our works. Our primary focus is on solving sentence embedding tasks such as STS, which may not perform optimally for embedding passages in MTEB. The prompt we designed is for sentence embedding. You might find it helpful to refer to recent papers that use similar methods for passage embedding[1].

If you are interested in the performance of our method on MTEB, you can refer to the following table [2] (summarization in the table, but I think that performance of our method on STS may not be such lower)

image

Thank you for your PR. But I can't merge it, because our method is only for sentence embedding.

[1] Zhuang S, Ma X, Koopman B, et al. PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval[J]. arXiv preprint arXiv:2404.18424, 2024.

[2] Springer J M, Kotha S, Fried D, et al. Repetition Improves Language Model Embeddings[J]. arXiv preprint arXiv:2402.15449, 2024.