sagitechls / SSN_SACE_2017_Jan

0 stars 3 forks source link

NA Treatment in puf_pres_conso #4

Closed karthikeyann12 closed 7 years ago

karthikeyann12 commented 7 years ago

Should I treat the NA in diagnostic/other/therapeutic columns with average/median, If I treat them then is it okay to treat subsequent percentage columns?

NA's count diagnostic, other, therapeutic - 10314, 9149, 16 respectively

eashwarsiddharth commented 7 years ago

In several cases;

services_performed = (diagnostic + other + therapeutic)

If the aforementioned calculation is not true, then, that specific case could be an anomaly. You might also want to consider the corresponding percentages to identify an anomaly.

So, I'm guessing the NAs in those columns actually have some meaning.

The specified doc_id has not performed any diagnostic service in the (city,state) combination, but, therapeutic/other services were rendered by the doc_id.

You could replace NAs by '0', if the record isn't an anomaly.

My 2 cents...

Rajhan commented 7 years ago

@karthikeyann12 answer to replacing NAs in different services column is in total services column and how it is used to calculate therapeutic,diag and other service column. @eashwarsiddharth correct.

karthikeyann12 commented 7 years ago

@Rajhan There are certain anomalies in these calculations (total drug cost/total claim count), removing these anomaly will result in elimination of large sets on data.