Merge files - Githubissues

When merging all files together, each record in aggregated files (i.e. prescriber.summary, PUF.summary) gets replicated the no. of times the unique doc_id appears in the detailed files (prescriber.detailed,PUF.detailed).

The issue here is

When all files merged together, while performing EDA or calculation, the replicated values get aggregated, which is an error.
To avoid replication of values , If I consider merging only the aggregated files (prescriber.summary,PUF.summary) I'm missing out on granularity on drug details and few other variables.
Or else, Is it advisable to proceed only with PUF.pres_ consolidated. file(with existing variables) which already exist ?

Which is an appropriate approach?

@Rajhan How do I go about it ?

each record in summary files is unique. The detailed file is not, but do drug level and procedure level aggregation. The consolidated table is derived from the other four files, use that as an example to build a dataset for each problem statement.

Problem statement: Utilization & Prescriber Data

Classification

Prescriber of Avonex vs Copaxone
Prescribers of Gilenya vs Copaxone
Prescribers of Branded vs Generics Prediction
Prescribers of Namenda
Prescribers of Generics

For example, Prescriber of Avonex vs Copaxone, take only the docs who have prescribed Avonex and copa. and consolidate all other information. Reminder, when applying any classification alg., you are not going to take doc_id.

sagitechls / SSN_SACE_2017_Jan

Merge files #10