Consider the following dataset:
1,john,doe
2,john,
3,john,watson
For matching purposes, I am assuming that both attributes are of equal importance and hence high=0.999 and low=0.001 has been set with Exact Comparator matching.
Normally the expectation is that
1: 1-match-1: produce match score of ~1
2: 1-match-2: produce a match score somewhere between 0.5 and 1, but much lower than #1
3: 1-match-3: produce a match score ~0.5 (as we are matching on 1 attribute).
I get the following scores:
1: 1-match-1: Overall: 0.999998997998
2: 1-match-2: Overall: 0.999
3: 1-match-3: Overall: 0.4999999999999998
Notice how close the scores are for #1 and #2. I understand that Duke ignores missing values. However, if I wanted to process missing values, what would be the best course of action.
I would like to achieve something like the following:
Consider the following dataset: 1,john,doe 2,john, 3,john,watson
For matching purposes, I am assuming that both attributes are of equal importance and hence high=0.999 and low=0.001 has been set with Exact Comparator matching.
Normally the expectation is that
1: 1-match-1: produce match score of ~1
2: 1-match-2: produce a match score somewhere between 0.5 and 1, but much lower than #1
3: 1-match-3: produce a match score ~0.5 (as we are matching on 1 attribute).
I get the following scores:
1: 1-match-1: Overall: 0.999998997998
2: 1-match-2: Overall: 0.999
3: 1-match-3: Overall: 0.4999999999999998
Notice how close the scores are for #1 and #2. I understand that Duke ignores missing values. However, if I wanted to process missing values, what would be the best course of action.
I would like to achieve something like the following:
1: 1-match-1: Overall: 0.999998997998
2: 1-match-2: Overall: 0.75
3: 1-match-3: Overall: 0.4999999999999998