Closed timbp closed 5 years ago
Hi,
Thanks a lot for raising this issue. Rest assured that we will take a close look. The counts for patterns that include a missing value should not miss pairs.
Ted
Hi,
Thanks again for raising this issue!
There was a problem on how missing values were handled gammaCKpar()
. The issue has been resolved and if you install using devtools
your R code should produce the desired output.
If anything, please do not hesitate to reach out.
Ted
all looks good now
It seems that if a variable has missing values, not all patterns are counted. Is this intended?
`
No missing values in these two variables. Counts sum to 175000 (== 500 * 350), and pattern (2, 2) has count of 50.
Add middlename, which has missing values:
`> g3 = gammaCKpar(dfA$middlename, dfB$middlename)
Counts now sum to 169410 so it appears 5590 pairs have not been counted. Pattern (2, 2, 2) has count of 43, but there are no other patterns starting (2, 2, ...) so 7 pairs that match on both firstname and lastname do not seem to appear in this table.
When I made my own code (in Julia) to count patterns, I got the following result: ` 0 0 0 115305
1 0 0 193 2 0 0 1477 0 1 0 39 0 2 0 79 1 2 0 1 0 0 1 24 0 0 2 816 1 0 2 2 2 0 2 10 0 2 2 1 2 2 2 43 0 0 missing 56193 1 0 missing 76 2 0 missing 683 0 1 missing 11 0 2 missing 40 2 2 missing 7`
Differences from the fastLink results are all in the patterns containing missing values.