Closed Mahmoudhallal closed 6 months ago
Indeed, in the implementation, the data are split into two subsets based on randna
. The min
value used in the mnar
subset is the the minumum value in the naset[!fData(naset)$randna, ]
subset, and not the min
from the whole data.
The randna
argument is expected to have length equal to nrow(naset)
and is used to split the data as described above, and I can indeed reproduce your example. There was no specific reason the set the features without missing values to TRUE
, and I did not anticipate this specific issue with MinProb
(and it's not clear whether there's a reason for this or whether it's a bug in imputeLCMD::impute.MinProb()
). Setting randna
to FALSE
to the features without missing values doesn't trigger the buggy line in imputeLCMD::impute.MinProb()
.
As your example indicates, the randna
value for these proteins without missing value is relevant, and one could argue whether they should be consider in either, both or none of the mar
and mnar
subsets. I am open to comments.
I followed the example of mixed imputation on the naset and my own dataset therefore I have 2 questions/possible bugs: 1) In the following example: x <- impute(naset, method = "mixed", randna = fData(naset)$randna, mar = "knn", mnar = "min")
the MNAR values are replaced by 0.029 which is not the dataset minimum (0.014). The value 0.029 is the minimum of the subset of rows with MNAR only. This was not clear for me, could you elaborate please?
2) In the naset fData, randna is a logical vector indicating the MAR such that the missing values of every row are MAR (TRUE) or MNAR (FALSE) where the rows which don’t need imputation are also TRUE (no missing values). I wonder why the rows with no missing values are included as TRUE? Since if you try imputing the MNAR with MinProb, it will fail with an error "[1] NA There were 16 warnings (use warnings() to see them)" with NaNs introduced. Assigning the no missing value rows as FALSE solves the problem. I was wondering what is the logic in including these rows as MNAR or MAR.
Thank you.