kosukeimai / fastLink

R package fastLink: Fast Probabilistic Record Linkage
253 stars 46 forks source link

NA handling Issue in expectation maximization function #79

Closed jw2249a closed 6 months ago

jw2249a commented 6 months ago

Sorry to spam the issues, but I found another issue while testing my package. The code below returns this row pattern as a match in results$EM. " 0 NA 0 NA "

Code to replicate:

library(fastLink)

varnames=c("firstname","middlename", "lastname","housenum")
stringdist.match = c("firstname","middlename", "lastname")
numeric.match="housenum"
cut.a = 0.92
cut.p = 0.88
cut.a.num=1
cut.p.num=2
data(samplematch)
dfA$housenum = as.numeric(dfA$housenum)
dfB$housenum = as.numeric(dfB$housenum)
results=fastLink(dfA,dfB,varnames, 
         stringdist.match = stringdist.match,
         stringdist.method = "jw",
         partial.match = varnames,
         numeric.match = numeric.match,
         cut.a = cut.a,cut.p = cut.p,
         cut.a.num = cut.a.num,
         cut.p.num = cut.p.num,
         dedupe.matches = F,
         jw.weight = 0.1,return.all = F)

results$EM$patterns.w 
jw2249a commented 6 months ago

Realized this is actually valid.