nreimers / beir-sparta

Re-Implementation of SPARTA model
Apache License 2.0
13 stars 2 forks source link

Hard Negatives #2

Open TJKlein opened 1 year ago

TJKlein commented 1 year ago

Hi, I have been trying to replicate the results of SPARTA (close but not successful so far). I saw you are using a file (hard-negatives-all.jsonl.gz) for hard-negatives. https://github.com/nreimers/beir-sparta/blob/4b686e006f416b4e58e45cf45b9c51b1f1b28d27/train_sparta_msmarco.py#L244 Can I assume this is similar to one of the files containing hard negatives from (https://huggingface.co/datasets/sentence-transformers/msmarco-hard-negatives), e.g., negatives/blob/main/msmarco-hard-negatives-bm25_1k.jsonl.gz? These files only contain the IDs, not the scores, though. https://github.com/nreimers/beir-sparta/blob/4b686e006f416b4e58e45cf45b9c51b1f1b28d27/train_sparta_msmarco.py#L259 You don't have the files with the score? So I assume they are sorted (in descending order)? Thanks, Tassilo

nreimers commented 1 year ago

Yes to both questions

TJKlein commented 1 year ago

Thanks for the quick and clarifying response. Any plans of also providing the scores (saving unnecessary CO2 emissions of recomputation :) )?