The cross-validated model has an accuracy of ~95% !
Usage/Effectiveness of Attributes from Prescriber files seem to be more when compared to Attributes from PUF files.
Custom Features:
12 out of 15 features (highlighted in red) seem to contribute effectively towards building the model.
Usage is at 100% for (mean_cost_per_day_per_claim, generic_usage_score_top_2), while for (generic_usage_score_top_3) it is ~85%.
mean_cost_per_day_per_claim is derived from the Prescriber_Detailed file
generic_usage_scores is also derived from the Prescriber_Detailed file
It is binary and it is based on 'top_drug_n' attributes.
Usage is at ~25% for Switch_Likelihood
It is derived from whether a doc_id is prescribing different drug_names that fall under the same generic_name and the proportion of all drugs prescribed that follow this criterion.
Usage is between 10% and 15% for (top_hcpcs_code_1, top_hcpcs_code_1_received_per_submitted_charge)
They are both derived from the PUF_Detailed file, in a fashion similar to top_drug_n & top_drug_n_cost
@Rajhan , I have shared all the relevant codes and RData files on my GitHub Repository. Please do check it out and share your feedback.
P.S.:
I have invited you as a collaborator on my GitHub Repository.
Did you derive the top_drug_n and its associated cost from the Prescriber_Detailed file ?
Custom Features:
@Rajhan , I have shared all the relevant codes and RData files on my GitHub Repository. Please do check it out and share your feedback.
P.S.: