I wonder how your ERNIE performs compared with BERT + N-gram masking? Since the BERT model released by Google does not contain this training procedure, which has shown to be quite useful in SQUAD dataset.
Yes, this strategy is very useful for some tasks and Google has updated their repo with the new pre-trained model. You can fine-tune BERT on the dataset.
I wonder how your ERNIE performs compared with BERT + N-gram masking? Since the BERT model released by Google does not contain this training procedure, which has shown to be quite useful in SQUAD dataset.