jamesmullenbach / caml-mimic

multilabel classification of EHR notes
MIT License
278 stars 125 forks source link

No such file or directory: '../mimicdata/mimic3/train_full_hadm_ids.csv' #1

Closed jeremija closed 6 years ago

jeremija commented 6 years ago

Hi @jamesmullenbach,

I'm getting an error while running the dataproc_mimic_III notebook:

dataproc/concat_and_split.pyc in split_data(labeledfile, base_name)
     61     for splt in ['train', 'dev', 'test']:
     62         hadm_ids[splt] = set()
---> 63         with open('%s/%s_full_hadm_ids.csv' % (MIMIC_3_DIR, splt), 'r') as f:
     64             for line in f:
     65                 hadm_ids[splt].add(line.rstrip())
IOError: [Errno 2] No such file or directory: '../mimicdata/mimic3/train_full_hadm_ids.csv'

The README.md states that these files are already in the repository:

|   |   *_hadm_ids.csv (already in repo)

However, it looks like they are not. Where can these files be found? Am I missing something?

jamesmullenbach commented 6 years ago

Sorry for the omission! I'm working on an update to this repository which I should push in the next couple of weeks. I'll include the files in the update. Thanks for the interest.

caolingyu commented 6 years ago

Hi @jamesmullenbach. I'm also very interested in this project and came across the same problem. What do the '*_hadm_ids.csv' files do? Is it possible to generate them by ourselves using MIMIC data?

Thanks in advance.

jamesmullenbach commented 6 years ago

Hi all,

The hadm id csv files are lists of id's that we used to create the train, validation, and test sets in our paper. I went ahead and added them to the repo.

Do keep an eye out for a code update soon, which should make the repo a bit cleaner overall and will include some pretrained models. (It will happen before the camera-ready deadline of NAACL, 4/16 :) )