gregdurrett / berkeley-doc-summarizer

The Berkeley Document Summarizer is a learning-based, single-document summarization system that extracts source document content, exploits syntactic information to compress it, and uses coreference constraints to ensure clarity.
GNU General Public License v3.0
741 stars 64 forks source link

Alex #1

Open alex-bloom opened 7 years ago

alex-bloom commented 7 years ago

The joint model (COREF+NER+WIKI) of the Berkeley Entity Resolution System combines the output for all input documents (e.g. government.txt and music.txt) into a single file output.conll. While the output produced by other models does not exactly match the test files in the Berkeley Document Summarizer (e.g. the last two columns of government.txt are off). Would appreciate a clarification on the assumed data interface between the Berkeley Entity Resolution System and the Berkeley Document Summarizer.

alex-bloom commented 7 years ago

Greg clarified that the utility class edu.berkeley.nlp.entity.preprocess.ConllDocSharder can be used for this splitting