Closed j6mes closed 6 years ago
Just checking: we are not planning on learning DR right? That's fine, but would be good though to ensure that the DR component is good enough for the entailment part. I.e., given an oracle RTE part, what is the accuracy given the DR we have? Should be better than random baseline, right? A related question, is there some kind of threshold to restrict the documents we get from DR? Or do we take the top one only? (probably a good start assuming it gives us decent accuracy with an oracle RTE)
On Tue, 5 Dec 2017 at 11:40 James Thorne notifications@github.com wrote:
To run
- MLP: Train on FNC, Evaluate on FNC, Evaluate on FEVER 3 way
- MLP: Train on FEVER with sampled negative pages, Test
- MLP: Train on FEVER with IR negative pages, Test
- DR: Final score for recall/precision/MRR
- RTE: Pre-trained model, evaluate on FEVER
- RTE: Train on FEVER bodies, evaluate on FEVER
Extra:
- BiDAF: Precision/Recall of pretrained model
- BiDAF: FEVER Accuracy using pretrained model on DRQA Pages
- RTE: Train on BiDAF retrieved model: evaluate P/R of BiDAF. Evaluate FEVER score
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sheffieldnlp/fever-baselines/issues/15, or mute the thread https://github.com/notifications/unsubscribe-auth/ABbUhWXtLlR0zvc3KPHpLqLoi0YC9mclks5s9SuqgaJpZM4Q2J6c .
The DR has no parameters, so there's nothing to learn. Taking the top 5 articles at the moment. Will also try taking all articles above a threshold.
The only metric I've done is recall the recall, but testing with an oracle RTE is a good idea and easy for me to do too.
To run
Extra: