karlhigley / lexrank-summarizer

A Spark-based LexRank extractive summarizer for text documents
MIT License
19 stars 4 forks source link

Combine input entries with the same identifier into a single document #14

Closed karlhigley closed 9 years ago

karlhigley commented 9 years ago

Since entries with the same identifier may contain overlapping or the same content, extract the distinct sentences from each document (i.e. filter out duplicates within each document).