orchid-initiative / synthetic-database-project

MIT License
4 stars 2 forks source link

Generate Fresh Data with Improvements #97

Open TravisHaussler opened 5 months ago

TravisHaussler commented 5 months ago

Here is the latest run with the additions:

csv_HCAIPDD_03-29-2024_0047.csv

Let me know how it looks!

I have identified a few issues I need to still work on:

  1. Incorporate MSDRG 768 in a reasonable way
  2. Fix the procedures to start with procedure 1 instead of procedure 2 (I fixed the actual field names, but how I am unpacking the list of other procedures seems to have the wrong indexing still)
  3. Understand why the emergency type procedure records include so many non-hospital records
  4. Continue to improve accuracy and completeness of data
TravisHaussler commented 5 months ago

I did 1 & 2 from the comment above and kicked off a run 5x the size of that one linked. Will see if it makes it through without error overnight

TravisHaussler commented 5 months ago

Here is a larger run vs before:

csv_HCAIPDD_03-30-2024_1713.csv

TravisHaussler commented 4 months ago

Here is the run broken down by year

csv_HCAIPDD_04-09-2024_1143_2024.csv csv_HCAIPDD_04-09-2024_1143_2023.csv csv_HCAIPDD_04-09-2024_1143_2022.csv csv_HCAIPDD_04-09-2024_1143_2021.csv csv_HCAIPDD_04-09-2024_1143_2020.csv csv_HCAIPDD_04-09-2024_1143_2019.csv csv_HCAIPDD_04-09-2024_1143_2018.csv