sbintuitions / JMTEB

The evaluation scripts of JMTEB (Japanese Massive Text Embedding Benchmark)
Creative Commons Attribution Share Alike 4.0 International
33 stars 11 forks source link

[Question] What does `relevance_scores` in ESCI reranking dataset represent? #78

Closed omihub777 closed 2 months ago

omihub777 commented 2 months ago

Thank you for your great work. I have a quick question regarding relevance_scores in ESCI. In your version of ESCI dataset, the relevance_scores takes one of the values from 0 to 3. What does each value correspond to the original labels (i.e. Exact / Substitute / Complement / Irrelevant)? My guess would be 0: Irrelevant, 1: Complement, 2: Substitute, 3: Exact, but want your confirmation. Thanks.

lsz05 commented 2 months ago

Hello, thank you for the issue!

Your guess is right. We convert labels to integer with the following dictionary: https://huggingface.co/datasets/sbintuitions/JMTEB/blob/7e8ad98b751ab42c572ceb4e7723a2885a3d2cfe/reranking.py#L24

omihub777 commented 2 months ago

Thanks!