Reproduction results for SQuAD 2.0

cooelf commented 5 years ago

Hi, thanks for your contribution! I am reproducing the results for SQuAD 2.0 using 8 V100 GPUs (32G).

I followed the provided hyper-paramters, but the result is lower than the reported one by about 0.5%. Do I miss anything? I also tried to set train_steps to 12000 and lr to 2e-5 but it showed little gain.

python run_squad.py \ --use_tpu=False \ --num_hosts=1 \ --num_core_per_host=8 \ --model_config_path=${INIT_CKPT_DIR}/xlnet_config.json \ --spiece_model_file=${INIT_CKPT_DIR}/spiece.model \ --output_dir=${PROC_DATA_DIR} \ --init_checkpoint=${INIT_CKPT_DIR}/xlnet_model.ckpt \ --model_dir=${MODEL_DIR} \ --train_file=${SQUAD_DIR}/train-v2.0.json \ --predict_file=${SQUAD_DIR}/dev-v2.0.json \ --uncased=False \ --max_seq_length=512 \ --do_train=True \ --train_batch_size=6 \ --do_predict=True \ --predict_batch_size=32 \ --learning_rate=3e-5 \ --adam_epsilon=1e-6 \ --iterations=1000 \ --save_steps=1000 \ --train_steps=8000 \ --warmup_steps=1000 \ $@

HasAns_exact = 83.6707152497
HasAns_f1 = 89.5736849476
HasAns_total = 5928 NoAns_exact = 85.7695542473 NoAns_f1 = 85.7695542473
NoAns_total = 5945
best_exact = 85.6396866841
best_exact_thresh = -4.23123931885
best_f1 = 88.3920251054 best_f1_thresh = -3.96573591232 exact = 84.7216373284
f1 = 87.6688961821
total = 11873

The paper indicated that joint training with NewsQA can push the EM from 86.12% to 86.35%. But I tried adding the NewsQA to SQuAD training set, the result droped about 0.5%. Could you give more details of the data augmentation? And any change of hyper-parameter setting?

Thanks!

rakshanda22 commented 5 years ago

@cooelf did you get any updates on the newsQA part?

cooelf commented 5 years ago

@rakshanda22 Unfortunately not yet :(

YJ-007 commented 4 years ago

Excuse me, I have successfully ran sudo./prepro_squad.sh, and I have moved on to the next step in the SQuAD2.0 training/testing. But I've got an error. Here is what the output is: ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: 'Tensor("arg0:0", shape=(), dtype=float32, device=/device:CPU:0)'

Have you ever met this problem before? If you met the problem wind you mind telling me how to fix it? Thank you very much!

zihangdai / xlnet

Reproduction results for SQuAD 2.0 #119