Closed SunSiShining closed 2 years ago
Hi @SunSiShining, I think in @luyug's setting the mined hard negatives are concatenated with original bm25-negatives. i.e.the train set is ~50k examples with bm25 negatives and ~50k examples with hard negatives. And during mining the hard negatives, probably also update the mined positive passages.
Thank you for such a quick reply, much appreciated. @MXueguang
The final training data consists of ~58k training queries with ~90 bm25 negatives per query and ~70k training queries with ~30 hard negatives per query.
I have merged the bm25 training data, but I have not updated the mined positive passages. I'll check if this is the key to degrade performance.
thank you again :D
I use the hard negative (hn.bert.json) you provided and I can reproduce R@5=75.8 But when I train with my own hard negatives, R@5 is only 64.3
How to generate hard negatives for NQ? Could you provide a reproduction setup?
Here is the setup for my mining hard negatives: Model: co-condenser-wiki trained with bm25-negative Negative depth: 200 Negative sample: 30
Looking forward to your reply!!! Thank you!