shmsw25 / AmbigQA

An original implementation of EMNLP 2020, "AmbigQA: Answering Ambiguous Open-domain Questions"
https://arxiv.org/abs/2004.10645
117 stars 22 forks source link

Is there any plan to release the implementation of SpanSeqGen? #3

Closed Yifan-Gao closed 4 years ago

Yifan-Gao commented 4 years ago

Hi, thanks for the great work and dataset. I am just wondering will the implementation of SpanSeqGen be released? If so, when will it be public? Thanks.

shmsw25 commented 4 years ago

Hi @Yifan-Gao, thanks for your interest. I am still refactoring the codes but actually SpanSeqGen part is already almost ready. I will release it in a week and leave a reply here.

shmsw25 commented 4 years ago

Hi @Yifan-Gao, sorry about the delay. Baseline codes are now ready here. Please let me know if you need pretrained models or predictions of DPR or SpanSeqGen, either on NQ or AmbigNQ.

Yifan-Gao commented 4 years ago

Hi @shmsw25 , thanks for the release. I have two questions as follows:

  1. In the tokenization step of BART Reader (SpanSeqGen Model), why do you use bart-large for --bert_name? I think it should be 'bert-base-uncased' since the DPR retriever only has bert model checkpoints.
for i in 0 1 2 3 4 5 6 7 8 9 ; do \ # for parallelization
  python3 cli.py --bert_name **bart-large** --output_dir out/dpr --do_predict --task dpr --predict_batch_size 3200 --db_index $i \
done
  1. As described in the AmbigQA paper, AmbigNQ have different evidence corpus with NQ-open:

    We use English Wikipedia dump from 2018-12-20 and 2020-01-20 for NQ-open and AMBIGNQ, respectively.

Why all baselines still use NQ-open wikipedia passages? The released AmbigNQ wikipedia dump has ~28M passages while it is ~22M for NQ-open.

shmsw25 commented 4 years ago

Hi @Yifan-Gao, thanks for asking!!

  1. Actually this command is only for tokenization for the reader. This command will load DPR prediction that was already saved through the retrieval step. You are right that for making DPR prediction, only BERT tokenized data will be used.

  2. You are right. Actually we were comparing the results when we use 2018 version and 2020 version, and the current codebase was only using 2018 version. Sorry about not mentioning this detail. I pushed new commits which actually can use 2020 version if you specify --wiki_2020 at inference time. I haven't finished running experiments to get full results and writing instructions. Let me update you when we are done with that - will be done in a few days.

Yifan-Gao commented 4 years ago

Thanks for your detailed explanation!

shmsw25 commented 4 years ago

We released complete instructions for running Wiki 2020 version and updated the results. We found that using Wiki 2020 instead of 2018 sometimes degrades the performance (esp. in case of DPR). I think it could be because the model is sensitive to overfitting. Nonetheless we ended up reporting 2020 version because (1) the best numbers on questions with multiple answers are the highest with 2020 version, and (2) in any way we believe it is the correct thing to do.

Also, in case you're interested, we also released model checkpoints.

Thanks for your interest!