markvanderloo / simputation

Making imputation easy
GNU General Public License v3.0
91 stars 11 forks source link

Error: Subscript `ina` is a matrix #32

Open mbac opened 3 years ago

mbac commented 3 years ago

Hi,

I’m getting the following error, and I wonder if you could help me understand its cause:

library(dplyr)
library(simputation)

kdata <- tribble(
    ~age, ~ct, ~pfratio, ~bmi,
    56,   86,   130,   30,
    58,   NA,   110,   NA,
    78,   NA,   NA,    28,
    54,   NA,   NA,    NA,
    45,   45,   230,   28,
    54,   45,   NA,    29
)

impute_knn(
    kdata,
        bmi ~ .,
        pool = "univariate"
    )
#> Warning: Requested k = 5 while 4 donors present. Using k = 4.
#> Error: Subscript `ina` is a matrix, the data `donors[ina]` must have size 1.

Created on 2021-05-30 by the reprex package (v2.0.0)

The same happens if I had more variables to the formula’s left-hand side (e.g., bmi + ct + pfratio ~ .).

I understand the warning that appears in this reprex. However, my actual data is in the hundreds of observations, yet it does have its fair share of NAs, and occasionally there can be up to 3 NAs per row. Is the error related to NAs in predictor variables?

Thanks!

ltd-pa commented 3 years ago

This issue seems to be related to how tibbles are handled by the algorithm. The algorithm will run if you convert the tibble to a dataframe first:

library(dplyr)
library(simputation)

kdata <- tibble::tribble(
    ~age, ~ct, ~pfratio, ~bmi,
    56,   86,   130,   30,
    58,   NA,   110,   NA,
    78,   NA,   NA,    28,
    54,   NA,   NA,    NA,
    45,   45,   230,   28,
    54,   45,   NA,    29
) %>% as.data.frame()

simputation::impute_knn(
    kdata,
        bmi ~ .,
        pool = "univariate"
    )