Open ghost opened 7 years ago
Hi there!
The processing pipeline is pretty much this:
pids
) - we do this because we access the hdf5 using the pid as a key, and have to iterate through that. If you're loading from CSV and pulling the whole thing into memory, you don't really need it, but it's probably simplest to just do it anyway - I'd recommend using something like sed '1d' vitalPeriodic.csv | cut -f 1 -d ',' | sort -u > pids.txt
to grab that list of patient IDs, but whatever works!data_utils.py
, you've got resampled_eICU
, which should do most of the heavy lifting, it calls these main functions:
generate_eICU_resampled_patients
: resample patients to measurements every 15 minutes (by default, this is an option), in the variables of interestget_cohort_of_complete_downsampled_patients
: subset the output of the above to patients missing no data (not sure why I made these separate functions!)Regarding the directory structure, for this part of the code there's nothing very specific about it - there's an eICU_dir
variable which gives the location of the h5 files (pat_df = pd.read_hdf(eICU_dir + '/vitalPeriodic.h5'
etc), and otherwise the other functions will save the intermediate files in the folder they're run from, I think.
If you hit a particular snag/error I can try to give you more specific information, otherwise I'll just end up describing the whole script unfortunately.
Hey Hello!
and Fanks for the great help - I think I get what data_utils.py
does.
So it looks like you only need to convert the patient.csv
and vitalPeriod.csv
to hdf5
. That's actually the bit that usually trips me up, as my data conversion kung foo is out of practice.
I guess I should post here the steps in case anyone else wants to try doing the eICU experiment?
Here's what I've got so far,
import numpy as np
import pandas as pd
import h5py
patients_filename_hdf5 = '/home/ajay/PythonProjects/eicu-code-master/Data/patient.h5'
patients_filename_csv = '/home/ajay/PythonProjects/eicu-code-master/Data/patient.csv'
# Load csv into memory as a pandas Dataframe
patients = pd.read_csv(patients_filename_csv)
# Have a look at the columns of the Dataframe
patients.head(5)
# convert it to a dictionary
patients_dict = patients.to_dict()
# have a look at the keys
patients_dict.keys()
# should be the same as the column names of the Dataframe
patients.columns.values
# This create the HDF5-file object we use to work on the file, in write ('w') mode.
h5f = h5py.File(patients_filename_hdf5, 'w')
# Now we add each of the arrays in the dictionary to the hdf file
# see - https://stackoverflow.com/questions/37214482/saving-with-h5py-arrays-of-different-sizes
for k,v in patients_dict.items():
print(k)
h5f.create_dataset(k,data=v)
Which returns my old friend,
patientunitstayid
TypeError: Object dtype dtype('O') has no native HDF5 equivalent
I've also tried using this script I found online - csv_to_hdf5.py, but I got a Segmentation fault (core dumped)
when I tried to convert the patient
table and ran out of memory for the vitalPeriodic
table?
Any ideas @XinruiLyu ?
As you said
I think you can probably skip this step and load from the CSVs directly
So I'll try that ,...., bad idea,...., not enough memory
Could you please share the eICU data files? I am not able to locate them in the repository. Also, how much time does it take to train on the MNIST data on a CPU?
We cannot share the eICU data. It’s not permitted by the data use agreement. Data access can be obtained here: http://eicu-crd.mit.edu/gettingstarted/access/
On Oct 16, 2017, at 5:51 AM, sbagchi12 notifications@github.com wrote:
Could you please share the eICU data files? I am not able to locate them in the repository. Also, how much time does it take to train on the MNIST data on a CPU?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ratschlab/RGAN/issues/11#issuecomment-336837593, or mute the thread https://github.com/notifications/unsubscribe-auth/AAqZsMmnlzRAeZ5NsAu_3bXJ36uTsTqmks5ssycygaJpZM4PM3IJ.
@AjayTalati I downloaded and unzipped the eICU dataset. Then I tried using your script and got the same error. @AjayTalati @corcra @ratsch and @XinruiLyu is there a fix for this issue? Thank you in advance.
Hi,
I'm also trying to train the RGAN to reproduce the results on the eICU data. I have not got the eICU dataset, could you give me a brief introduction abot the structure of the dataset?
what are the 7 lables in the manuscript?
thank you
contact me : zhangxuewen2018@gmail.com
Hi,
I'm trying to train the RGAN to reproduce your results from section 5 table 2 of the manuscript. I've got the eICU dataset, but I'm not sure of how you pre-processed/reshaped it - to be honest I'm kind of lost?
Would it be possible to share the linux command-line instructions, and the directory structure you need to run in order to get the code training - pretty please with cherries on top.
Fan Q