Closed dodgy99 closed 7 years ago
I am using this library in an Apache Spark application (using scala).
I have been seeing variable results using the NGram algorithm where exact matches result is either "0.0" or "1.0". Below are some examples.
`QGram dig = new QGram(2);
dig.distance("S","S") //result = 1.0
dig.distance("Kirk","Kirk") //result = 0.0
dig.distance("07426796542","07426796542") //result = 0.0`
Should all these examples not result in a score of 1.0 as they are exactly the same?
Hi,
Thank you!
This happens because the strings "S" are two short (less then 2 characters). I will correct this and publish a new release...
Fixed in release 0.21
I am using this library in an Apache Spark application (using scala).
I have been seeing variable results using the NGram algorithm where exact matches result is either "0.0" or "1.0". Below are some examples.
`QGram dig = new QGram(2);
dig.distance("S","S") //result = 1.0
dig.distance("Kirk","Kirk") //result = 0.0
dig.distance("07426796542","07426796542") //result = 0.0`
Should all these examples not result in a score of 1.0 as they are exactly the same?