Open TJKlein opened 1 year ago
Hi, I have been trying to replicate the results of SPARTA (close but not successful so far). I saw you are using a file (hard-negatives-all.jsonl.gz) for hard-negatives. https://github.com/nreimers/beir-sparta/blob/4b686e006f416b4e58e45cf45b9c51b1f1b28d27/train_sparta_msmarco.py#L244 Can I assume this is similar to one of the files containing hard negatives from (https://huggingface.co/datasets/sentence-transformers/msmarco-hard-negatives), e.g., negatives/blob/main/msmarco-hard-negatives-bm25_1k.jsonl.gz? These files only contain the IDs, not the scores, though. https://github.com/nreimers/beir-sparta/blob/4b686e006f416b4e58e45cf45b9c51b1f1b28d27/train_sparta_msmarco.py#L259 You don't have the files with the score? So I assume they are sorted (in descending order)? Thanks, Tassilo
Yes to both questions
Thanks for the quick and clarifying response. Any plans of also providing the scores (saving unnecessary CO2 emissions of recomputation :) )?
Hi, I have been trying to replicate the results of SPARTA (close but not successful so far). I saw you are using a file (hard-negatives-all.jsonl.gz) for hard-negatives. https://github.com/nreimers/beir-sparta/blob/4b686e006f416b4e58e45cf45b9c51b1f1b28d27/train_sparta_msmarco.py#L244 Can I assume this is similar to one of the files containing hard negatives from (https://huggingface.co/datasets/sentence-transformers/msmarco-hard-negatives), e.g., negatives/blob/main/msmarco-hard-negatives-bm25_1k.jsonl.gz? These files only contain the IDs, not the scores, though. https://github.com/nreimers/beir-sparta/blob/4b686e006f416b4e58e45cf45b9c51b1f1b28d27/train_sparta_msmarco.py#L259 You don't have the files with the score? So I assume they are sorted (in descending order)? Thanks, Tassilo