yy6linda / synthetic_EHR_data

0 stars 0 forks source link

Define the dead and alive patient cohorts for medGAN #2

Open yy6linda opened 4 years ago

yy6linda commented 4 years ago

create two feature sets for the dead and alive patient cohorts

yy6linda commented 4 years ago

I used ivan's model which generated a sparse matrix with 42549 features, I used todense methods for feeding the model into the medgan Here I didn't use the full alive set, all the death set is kept and alive set is randomly selected based on death: alive = 1:5 Death: (8309,42549) Alive (613577,42549) when training death.npy, due to the small size of death.npy, use replace = True

 File "train.py", line 96, in <module>
    saveMaxKeep=args.save_max_keep)
  File "/data/users/yanyao/myproj/synpuf/omop/app/1000-1000-death-SynthEHR/model.py", line 391, in train
    batchIdx = np.random.choice(idx, size=batchSize, replace=False)
  File "mtrand.pyx", line 948, in numpy.random.mtrand.RandomState.choice
ValueError: Cannot take a larger sample than population when 'replace=False'
"nohup.out" 3065L, 5097612C    
yy6linda commented 4 years ago
nohup python train.py --data_file './death.npy' --n_pretrain_epoch 1000 --n_epoch 1000&

synthetic death.npy /data/users/yanyao/myproj/synpuf/omop/app/1000-1000-death-SynthEHR synthetic alive.npy /data/users/yanyao/myproj/synpuf/omop/app/1000-1000-alive-SynthEHR

yy6linda commented 4 years ago

only use condition: (8309, 28664) (41545, 28664) Inside 28664, is 28539 condition_source_value, 116 year of birth, 3 gender concept_id , 6 race_concept_id