Disentangle/test similarity computation and Lexrank model

karlhigley / lexrank-summarizer

A Spark-based LexRank extractive summarizer for text documents

MIT License

19 stars 4 forks source link

Disentangle/test similarity computation and Lexrank model #23

Closed karlhigley closed 9 years ago

karlhigley commented 9 years ago

This separates the similarity computation from the LexRank model, which makes both easier to test, and adds some simple sanity check tests. Along the way, discovered two issues:

GraphX edges are directed, so need to create two edges between each pair of vertices
It's possible (but unlikely) that an LSH bucket will be empty, which could cause a failure