Open eashwarsiddharth opened 7 years ago
@Rajhan here is a look at the pattern in missing values.
ggplot_missing <- function(x){
x %>% is.na %>% melt %>%
ggplot(data = ., aes(x = X1, y = X2)) +
geom_raster(aes(fill = value)) +
scale_fill_grey(name = "", labels = c("Present","Missing")) +
labs(y = "Variables in Dataset", x = "Rows / observations")
}
ggplot_missing(avocop)
2430 is a good number. Go with it.
@Rajhan Any suggestions on how to substitute NAs in the following attributes:
NAs correspond to the suppress_flag. But, the description of the flag only talks about the claim_count and not the cost/supply related attributes.
bene_count_ge65_suppress_flag from prescriber_summary, gives a clue for claim_count:
How do I go about substituting NAs in the cost/supply related attributes ?
In spite of this roadblock, I have 75% complete cases (2430 records) for the (Avonex vs Copaxone) scenario.