BERT for Coreference Resolution: Baselines and Analysis

Contribution summary

Joshi et al. proposed BERT-based CR method
to employ BERT's ability of passage-level understanding.
The model achieved SOTA on the GAP and OntoNotes benchmarks. The qualitative analysis showed that (1) handling pronouns in conversations and (2) mention paraphrasing are still difficult for the model.

Mandar Joshi, Omer Levy, Daniel S. Weld, and Luke Zettlemoyer (University of Washington, AI2, FAIR)

BERT's major improvement is passage-level training, which allows it to better model longer sequences
Can we apply it to CR task?

Proposed BERT-based CR method.
Two ways of extending the c2f-coref, ELMo-based CR model:
- The independent variant uses non-overlapping segments each of which acts as an independent instance for BERT
- The overlap variant splits the document into overlapping segments so as to provide the model with context beyond 512 tokens

Dataset

Results

Achieved SOTA on the GAP and OntoNotes benchmarks
- with +6.2 F1 (baseline: BERT+RR) and +0.3 F1 (baseline: EE)
The overlap variant offers no improvement over independent

Insight

Unable to handle conversations: Modeling pronouns especially in the context of conversations (Table 3), continues to be difficult for all models, perhaps partly because c2f-coref does very little to model dialog structure of the document.
Importance of entity information: The models are unable to resolve cases requiring mention paraphrasing.
- E.g., Bridging the Royals with Prince Charles and his wife Camilla likely requires pretraining models to encode relations between entities