sagitechls / SSN_SACE_2017_Jan

0 stars 3 forks source link

NAs in Cost related attributes. #30

Open eashwarsiddharth opened 7 years ago

eashwarsiddharth commented 7 years ago

@Rajhan Any suggestions on how to substitute NAs in the following attributes:

NAs correspond to the suppress_flag. But, the description of the flag only talks about the claim_count and not the cost/supply related attributes.

bene_count_ge65_suppress_flag from prescriber_summary, gives a clue for claim_count:

bene_count_ge65 = 5, iff, bene_count_ge65_suppress_flag = '*' Thus, for all ge65_suppress_flag = '#', total_claim_count_ge65 = total_claim_count - 5

How do I go about substituting NAs in the cost/supply related attributes ?

In spite of this roadblock, I have 75% complete cases (2430 records) for the (Avonex vs Copaxone) scenario.

eashwarsiddharth commented 7 years ago

@Rajhan here is a look at the pattern in missing values.

screen shot 2017-07-31 at 12 25 34 am
ggplot_missing <- function(x){
                    x %>% is.na %>% melt %>%
                      ggplot(data = ., aes(x = X1, y = X2)) +
                      geom_raster(aes(fill = value)) +
                      scale_fill_grey(name = "", labels = c("Present","Missing")) +
                      labs(y = "Variables in Dataset", x = "Rows / observations")
                  }
ggplot_missing(avocop)
Rajhan commented 7 years ago

2430 is a good number. Go with it.