mp2893 / medgan

Generative adversarial network for generating electronic health records.
BSD 3-Clause "New" or "Revised" License
270 stars 89 forks source link

Dimensions of input data #9

Open arielpeterson opened 6 years ago

arielpeterson commented 6 years ago

Hi Edward,

I did not use the MIMIC-III dataset, but used my own data set of patients and their diagnosis codes. I constructed a binary matrix like process_mimic.py where each row is a patient and each column is an ICD9/ICD10 code as my input. I have 1,064 unique ICD9/ICD10 codes and looking at medgan.py it looks like you set the inputDim=615. Is this for the number of unique codes in data set A?

I adjusted "inputDim" to be 1064 in my case and the resulting Numpy array has dimensions (10000,186). I thought the resulting dimensions would be (10000,1064). Are there other adjustments I need to make to the model?

Thank you!

mp2893 commented 6 years ago

Hi Ariel, Yes inputDim is to specify how many medical codes (e.g. diagnosis code, medication code) there are in your dataset. But my code determines inputDim automatically by the data you provide as an input (the "data_file"). I'm not sure where 186 is coming from. Maybe your data is actually a matrix of (patient number X 186)? Also, I just want to make sure that you've trained medGAN before you tried to generate synthetic records? Another possibility is that too large a numpy matrix cannot be saved, so maybe that could be the reason?