Open Aravind-Parthiban opened 7 years ago
each record in summary files is unique. The detailed file is not, but do drug level and procedure level aggregation. The consolidated table is derived from the other four files, use that as an example to build a dataset for each problem statement.
Problem statement: Utilization & Prescriber Data
Classification
For example, Prescriber of Avonex vs Copaxone, take only the docs who have prescribed Avonex and copa. and consolidate all other information. Reminder, when applying any classification alg., you are not going to take doc_id.
When merging all files together, each record in aggregated files (i.e. prescriber.summary, PUF.summary) gets replicated the no. of times the unique doc_id appears in the detailed files (prescriber.detailed,PUF.detailed).
The issue here is
Which is an appropriate approach?
@Rajhan How do I go about it ?