yy6linda / synthetic_EHR_data

0 stars 0 forks source link

OMOP data curation #1

Open yy6linda opened 4 years ago

yy6linda commented 4 years ago

This assignment involves:

  1. Unifying the terminology for each domains
  2. Remove duplicated records
  3. Split dataset to training/evaluation
yy6linda commented 4 years ago

Go through and document the splitting process which only keeps the patients that have at least 10 visits

yy6linda commented 4 years ago

How to run data curation:

  1. nohup python query_data_sql.py&
  2. nohup python split_tran_eval.py -f "./data" -p "split_train_eval"&