smartschat / cort

A toolkit for coreference resolution and error analysis.
MIT License
129 stars 34 forks source link

cort

cort is a coreference resolution toolkit. It consists of two parts: the coreference resolution component implements a framework for coreference resolution based on latent variables, which allows you to rapidly devise approaches to coreference resolution, while the error analysis component provides extensive functionality for analyzing and visualizing errors made by coreference resolution systems.

If you have any questions or comments, drop me an e-mail at sebastian.martschat@gmail.com.

Branches/Forks

Documentation

Installation

cort is available on PyPi. You can install it via

pip install cort

Dependencies (automatically installed by pip) are nltk, numpy, matplotlib, mmh3, PyStanfordDependencies, cython, future, jpype and beautifulsoup. It ships with stanford_corenlp_pywrapper and the reference implementation of the CoNLL scorer.

cort is written for use on Linux with Python 3.3+. While cort also runs under Python 2.7, I strongly recommend running cort with Python 3, since the Python 3 version is much more efficient.

References

Yangfeng Ji, Chenhao Tan, Sebastian Martschat, Yejin Choi and Noah A. Smith (2017). Dynamic Entity Representations in Neural Language Models. To appear in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP), Copenhagen, Denmark, 7-11 September 2017.
PDF

Sebastian Martschat (2017). Structured Representations for Coreference Resolution. PhD thesis, Heidelberg University.
PDF

Nafise Sadat Moosavi and Michael Strube (2016). Search space pruning: A simple solution for better coreference resolvers. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, Cal., 12-17 June 2016, pages 1005-1011.
PDF

Sebastian Martschat and Michael Strube (2015). Latent Structures for Coreference Resolution. Transactions of the Association for Computational Linguistics, 3, pages 405-418.
PDF

Sebastian Martschat, Patrick Claus and Michael Strube (2015). Plug Latent Structures and Play Coreference Resolution. In Proceedings of the Proceedings of ACL-IJCNLP 2015 System Demonstrations, Beijing, China, 26-31 July 2015, pages 61-66.
PDF

Sebastian Martschat, Thierry Göckel and Michael Strube (2015). Analyzing and Visualizing Coreference Resolution Errors. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, Denver, Colorado, USA, 31 May-5 June 2015, pages 6-10.
PDF

Sebastian Martschat and Michael Strube (2014). Recall Error Analysis for Coreference Resolution. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25-29 October 2014, pages 2070-2081.
PDF

Sebastian Martschat (2013). Multigraph Clustering for Unsupervised Coreference Resolution. In Proceedings of the Student Research Workshop at the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 5-7 August 2013, pages 81-88.
PDF

If you use the error analysis component in your research, please cite the EMNLP'14 paper. If you use the coreference component in your research, please cite the TACL paper. If you use the multigraph system, please cite the ACL'13-SRW paper.

Changelog

Wednesday, 4 November 2015
Support numeric features. Due to a different feature representation the models changed, hence I have updated the downloadable models.

Friday, 9 October 2015
Now supports label-dependent cost functions.

Tuesday, 15 September 2015
Minor bugfixes.

Monday, 27 July 2015
Now can perform coreference resolution on raw text.

Tuesday, 21 July 2015
Updated to status of TACL paper.

Wednesday, 3 June 2015
Improvements to visualization (mention highlighting and scrolling).

Monday, 1 June 2015
Fixed a bug in mention highlighting for visualization.

Sunday, 31 May 2015
Updated to status of NAACL'15 demo paper.

Wednesday, 13 May 2015
Fixed another bug in the documentation regarding format of antecedent data.

Tuesday, 3 February 2015
Fixed a bug in the documentation: part no. in antecedent file must be with trailing 0s.

Thursday, 30 October 2014
Fixed data structure bug in documents.py. The results from the paper are not affected by this bug.

Wednesday, 22 October 2014
Initial release.