Precompute size of sparsified matrices (instead of auto-computation)

The way to view the LSH-based bucketing is as creating a sparser (not smaller) matrix. Based on that understanding, this change uses the total number of sentences as the column size and the largest feature index output by the TF-IDF transformers as the row size of each matrix created from bucketed sentences. Using these dimensions avoids having to compute the size of each matrix, and avoids errors when the column indices are larger than the number of columns.

(The only apparent downside is using some extra memory when computing the column magnitudes.)

karlhigley / lexrank-summarizer

Precompute size of sparsified matrices (instead of auto-computation) #20