ibecav / CGPfunctions

Powell Miscellaneous Functions for Teaching and Learning Statistics
Other
25 stars 11 forks source link

Complete cases in chaid_table #43

Open FedericoTrifoglio opened 3 years ago

FedericoTrifoglio commented 3 years ago

Hi, chaid_table.R has been really helpful, but when creating node_table, only filtering complete cases may result in 0 frequencies for the outcome levels (if each case has at least one missing value across the independent variables). Those 0 frequencies will later cause the chisq.test to raise an error because x must be positive. I've removed both filter(complete.cases(.)) %>% and it works just fine to me. Also noticed that you did tidyr::pivot_wider in row 76 but not in row 87.

ibecav commented 3 years ago

I'm not sure I fully understand. The underlying CHAID::chaid function only operates on the complete case dataset. If you could share a reproducible example see for example https://stackoverflow.com/questions/48874304/creating-reproducible-example-using-reprex-package-in-r-where-a-local-file-is-be I'd be happy to look.

FedericoTrifoglio commented 3 years ago

I see. I thought I could set na.action to na.pass in CHAID::chaid to make it work without having complete cases. Would that cause the CHAID algorithm to give bad results? Should I convert NAreal's to NAcharacter's, so "NA" is a factor level? Anyway example below

library(CHAID)
library(CGPfunctions)
set.seed(290875)
USvoteS <- USvote[sample(1:nrow(USvote), 1000),]
random_NA <- function(x) {
  x[sample(2:6, 1)] <- NA_real_
  x
}
USvoteSNA <- as.data.frame(t(apply(USvoteS, 1, random_NA)))
USvoteSNA <- as.data.frame(lapply(USvoteSNA, factor))
ctrl <- chaid_control(minsplit = 200, minprob = 0.1)
chaidUS <- chaid(vote3 ~ ., na.action = na.pass, data = USvoteSNA, control = ctrl)
chaid_table(chaidUS)
# Error in chisq.test(., correct = FALSE) : 
  at least one entry of 'x' must be positive