Questions about NYT datasets processing

gregdurrett / berkeley-doc-summarizer

The Berkeley Document Summarizer is a learning-based, single-document summarization system that extracts source document content, exploits syntactic information to compress it, and uses coreference constraints to ensure clarity.

GNU General Public License v3.0

741 stars 64 forks source link

Questions about NYT datasets processing #6

Open airzs opened 4 years ago

airzs commented 4 years ago

Thanks for your code sharing. Can you tell me what should I put into train_corefner_standoff and train_abstracts_standoff directory?

gregdurrett commented 4 years ago

This used to be available here: http://nlp.cs.berkeley.edu/projects/summarizer.shtml but unfortunately the files are gone. I am unfortunately unable to maintain this repo anymore.

My PhD student Jiacheng Xu has a similar system from a summer internship project that's much more up-to-date: https://github.com/jiacheng-xu/DiscoBERT

airzs commented 4 years ago

Thanks, it is helpful.