There are two main issues w/the scoring functions as they exist in the original project
There are a variety of functions unaccessible from calling clients, there's no functionality to set custom scoring functions so why define them
The default scoring functions are broken. They claim to be "weighted" but only consider weights for matched elements. Any unmatched element is assumed to have weight 1. This can produce inaccurate results in a variety of cases, the most obvious is matching on two elements:
e1: NAME, weight: .01
e2: ADDRESS, weight: .1
If we assume two documents:
d1 w/ ADDRESS = "123 Main st"
d2 w/ ADDRESS = "123 Main st" NAME = "John Doe"
The old getWeightedAverageScore would have = (.1 + .5) / (.1 + 2 - 1) = .5454
The new getWeightedAverageScore would have = (.1 + .005) / (.11) = .95454
The minimally weighted name is disproportionately pulling down the score
There are two main issues w/the scoring functions as they exist in the original project
e1: NAME, weight: .01 e2: ADDRESS, weight: .1
If we assume two documents: d1 w/ ADDRESS = "123 Main st" d2 w/ ADDRESS = "123 Main st" NAME = "John Doe" The old
getWeightedAverageScore
would have = (.1 + .5) / (.1 + 2 - 1) = .5454 The newgetWeightedAverageScore
would have = (.1 + .005) / (.11) = .95454The minimally weighted name is disproportionately pulling down the score