This PR adds support for computing the cosine similarity using n-grams of the strings. The old behavior is equivalent to using 1-grams. Example from the provided specs:
it 'returns correct 2-gram similarity' do
# here _ is substitution for the pad symbol
# abc has bigrams: _a, ab, bc, c_
# abcacbc has bigrams: _a, ab, bc, ca, ac, cb, bc, c_
expect(klass.cosine('abc', 'abcacbc',
ngram: 2)).to be_within(0.001).of(0.79)
end
This PR adds support for computing the cosine similarity using n-grams of the strings. The old behavior is equivalent to using 1-grams. Example from the provided specs:
Maybe it can be documented more.
Anyway, would be happy to receive any feedback :)