This separates the similarity computation from the LexRank model, which makes both easier to test, and adds some simple sanity check tests. Along the way, discovered two issues:
GraphX edges are directed, so need to create two edges between each pair of vertices
It's possible (but unlikely) that an LSH bucket will be empty, which could cause a failure
This separates the similarity computation from the LexRank model, which makes both easier to test, and adds some simple sanity check tests. Along the way, discovered two issues: