ratschlab / RGAN

Recurrent (conditional) generative adversarial networks for generating real-valued time series data.
https://arxiv.org/abs/1706.02633
MIT License
640 stars 181 forks source link

Reproducing the eICU experiment from section 5 of the manuscript #11

Open ghost opened 7 years ago

ghost commented 7 years ago

Hi,

I'm trying to train the RGAN to reproduce your results from section 5 table 2 of the manuscript. I've got the eICU dataset, but I'm not sure of how you pre-processed/reshaped it - to be honest I'm kind of lost?

Would it be possible to share the linux command-line instructions, and the directory structure you need to run in order to get the code training - pretty please with cherries on top.

Fan Q

corcra commented 7 years ago

Hi there!

The processing pipeline is pretty much this:

Regarding the directory structure, for this part of the code there's nothing very specific about it - there's an eICU_dir variable which gives the location of the h5 files (pat_df = pd.read_hdf(eICU_dir + '/vitalPeriodic.h5' etc), and otherwise the other functions will save the intermediate files in the folder they're run from, I think.

If you hit a particular snag/error I can try to give you more specific information, otherwise I'll just end up describing the whole script unfortunately.

ghost commented 7 years ago

Hey Hello!

and Fanks for the great help - I think I get what data_utils.py does.

So it looks like you only need to convert the patient.csv and vitalPeriod.csv to hdf5. That's actually the bit that usually trips me up, as my data conversion kung foo is out of practice.

I guess I should post here the steps in case anyone else wants to try doing the eICU experiment?

Here's what I've got so far,

import numpy as np
import pandas as pd
import h5py

patients_filename_hdf5 = '/home/ajay/PythonProjects/eicu-code-master/Data/patient.h5'
patients_filename_csv  = '/home/ajay/PythonProjects/eicu-code-master/Data/patient.csv'

# Load csv into memory as a pandas Dataframe
patients = pd.read_csv(patients_filename_csv)

# Have a look at the columns of the Dataframe
patients.head(5)

# convert it to a dictionary
patients_dict = patients.to_dict()

# have a look at the keys
patients_dict.keys()

# should be the same as the column names of the Dataframe
patients.columns.values

# This create the HDF5-file object we use to work on the file, in write ('w') mode.
h5f = h5py.File(patients_filename_hdf5, 'w')

# Now we add each of the arrays in the dictionary to the hdf file 
# see - https://stackoverflow.com/questions/37214482/saving-with-h5py-arrays-of-different-sizes

for k,v in patients_dict.items():
    print(k)
    h5f.create_dataset(k,data=v)

Which returns my old friend,

patientunitstayid
TypeError: Object dtype dtype('O') has no native HDF5 equivalent

I've also tried using this script I found online - csv_to_hdf5.py, but I got a Segmentation fault (core dumped) when I tried to convert the patient table and ran out of memory for the vitalPeriodic table?

Any ideas @XinruiLyu ?

As you said

I think you can probably skip this step and load from the CSVs directly

So I'll try that ,...., bad idea,...., not enough memory

sbagchi12 commented 7 years ago

Could you please share the eICU data files? I am not able to locate them in the repository. Also, how much time does it take to train on the MNIST data on a CPU?

ratsch commented 7 years ago

We cannot share the eICU data. It’s not permitted by the data use agreement. Data access can be obtained here: http://eicu-crd.mit.edu/gettingstarted/access/

On Oct 16, 2017, at 5:51 AM, sbagchi12 notifications@github.com wrote:

Could you please share the eICU data files? I am not able to locate them in the repository. Also, how much time does it take to train on the MNIST data on a CPU?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ratschlab/RGAN/issues/11#issuecomment-336837593, or mute the thread https://github.com/notifications/unsubscribe-auth/AAqZsMmnlzRAeZ5NsAu_3bXJ36uTsTqmks5ssycygaJpZM4PM3IJ.

kazemSafari commented 6 years ago

@AjayTalati I downloaded and unzipped the eICU dataset. Then I tried using your script and got the same error. @AjayTalati @corcra @ratsch and @XinruiLyu is there a fix for this issue? Thank you in advance.

data-boss commented 6 years ago
Hi,

I'm also trying to train the RGAN to reproduce the results on the eICU data. I have not got the eICU dataset, could you give me a brief introduction abot the structure of the dataset?
what are the 7 lables in the manuscript?
thank you

contact me : zhangxuewen2018@gmail.com