seunglee98 / fedmatch

Other
27 stars 10 forks source link

Dealing with missing values in matching #4

Closed tedmoorman closed 3 years ago

tedmoorman commented 3 years ago

It would be helpful to have a convenient way to deal with missing values in the match columns. Otherwise, I get an error message telling me there are too many matches. If the solution involves filter and filter.args, could you please provide some examples? The documentation and vignettes aren't providing much guidance here.

c0webster commented 3 years ago

Hi Ted, Thanks for the comment. Right now, there isn't anything built in to handle this. filter and filter.args are for filtering after a match, and you're right that this part isn't well documented. I'll work on some code to optionally filter out NAs beforehand. I actually had no idea that merge.data.table returned NA-NA matches.

c0webster commented 3 years ago

This issue has been fixed. The new behavior is to remove NAs in the by.x and by.y columns during exact and fuzzy matches, while telling the user how many observations are removed. These observations are re-inserted in later tiers if necessary. I'll keep this in the development version for now, and push it to CRAN after testing it at work for a month or so.