kosukeimai / fastLink

R package fastLink: Fast Probabilistic Record Linkage
260 stars 47 forks source link

Uninformative error in fastLink during imputation #50

Closed emma-klein closed 3 years ago

emma-klein commented 3 years ago

Hi,

While running fastLink I have gotten the same error several times with different parameters and I don't understand how to debug it.

The

My code is this:

fs.out <- fastLink( dfA = dfA, dfB = dfB, varnames = c('FirstName', 'LastName', 'DOB_str', 'street_num', 'street_name', 'Gender'), stringdist.match = c('FirstName', 'LastName', 'DOB_str', 'street_num', 'street_name'), partial.match = c('FirstName', 'LastName', 'street_name'), gender.field = 'Gender', cut.a = 0.95, cut.p = 0.85, em.obj = fs.out.10$EM )

The EM object is the output from a 10% sample of the data, which ran fine and is adding to my confusion. I've tried adjusting the cut.a and cut.p parameters several times (as suggested in a closed issue here that was related) and it hasn't helped.

The error I receive is this: If you set return.all to FALSE, you will not be able to calculate a confusion table as a summary statistic. Calculating matches for each variable. Getting counts for parameter estimation. Parallelizing calculation using OpenMP. 7 threads out of 8 are used. Imputing matching probabilities using provided EM object. Error in p.gamma.k.m[[i]] : subscript out of bounds

Any assistance would be appreciated. Thank you!

emma-klein commented 3 years ago

I was trying to use blocking when the EM object wasn't blocked. Figured it out, sorry!