Open NickKramer87 opened 1 year ago
Its possible to override the hospitals list to only include Keck and then set the location for the run to be somewhat close. We did a test run with Pasadena and it produced results.
Sub task notes:
Medicare did an analysis comparing Synthea's data to real Medicare claims data. This might be something of a model for the analysis. See pages 13-27 of this document.
Here is a 500 male and 500 female run of synthea in los angeles area with only keck as a possible hospital
Thanks, @TravisHaussler! lol They all still live in Massachusetts somehow.
Hm, the layout of this doesn't seem quite right and it looks like the diagnosis codes are still SNOMED. Could you upload the log and fixed-width output too? @TravisHaussler
I’ll take a look tomorrow morning. I am surprised the patients themselves didn’t generate in the right location at least, that seems strange to me. I’ll check the diagnosis too, that’s odd (we expect the procedure ones to be still though)
On Thu, Nov 9, 2023 at 4:19 PM rileeki @.***> wrote:
Hm, the layout of this doesn't seem quite right and it looks like the diagnosis codes are still SNOMED. Could you upload the log and fixed-width output too? @TravisHaussler https://github.com/TravisHaussler
— Reply to this email directly, view it on GitHub https://github.com/orchid-initiative/synthetic-database-project/issues/64#issuecomment-1804879146, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZL3CWKUWLQUBG3Q5WXD3C3YDVXIRAVCNFSM6AAAAAA6H67RDWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMBUHA3TSMJUGY . You are receiving this because you were mentioned.Message ID: @.*** com>
Here is what I see for the first bunch of rows from that file (taking out the mass of extra blank diagnosis and procedure code fields):
The diagnosis codes are ICD, while the Procedure codes are still SNOMED and I see all the addresses in CA
@TravisHaussler You are totally right. I'm not even sure what I was looking at... I'm sorry about that!
@TravisHaussler I'll pick this up for the next two weeks. I plan to slice and dice the data you provided and provide a report at our next check-in comparing this dataset to the publicly available summary statistics.
Ok, let me know if you want it run with larger numbers too. I hit programmatic errors at first reading in the huge synthea csvs but that’s a little better now that we only load specific desired columns. I can also run it a handful of separate times and concatenate the results I guess, but that’s maybe slightly statistically different
On Fri, Nov 17, 2023 at 1:29 PM rileeki @.***> wrote:
@TravisHaussler https://github.com/TravisHaussler I'll pick this up for the next two weeks. I plan to slice and dice the data you provided and provide a report at our next check-in comparing this dataset to the publicly available summary statistics.
— Reply to this email directly, view it on GitHub https://github.com/orchid-initiative/synthetic-database-project/issues/64#issuecomment-1817130479, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZL3CWJJAADJ4DCP4OJBWULYE7JJZAVCNFSM6AAAAAA6H67RDWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJXGEZTANBXHE . You are receiving this because you were mentioned.Message ID: @.*** com>
As a database generator, I want to validate the accuracy of the synthetic database by comparing the summary statistics of a database of patients that would go to Keck Hospital to the actual summary statistics from Keck Hospital to determine if significant changes are needed to the database generation program.
Requirements:
Potential subtasks:
Acceptance Criteria: