Closed karlhigley closed 9 years ago
This removes tokens that only occur once (which are irrelevant to computing cosine similarity) and strips out non-alphabetic characters (which can lead to double-counting essentially the same token).
This removes tokens that only occur once (which are irrelevant to computing cosine similarity) and strips out non-alphabetic characters (which can lead to double-counting essentially the same token).