synthetichealth / synthea

Synthetic Patient Population Simulator
https://synthetichealth.github.io/synthea
Apache License 2.0
2.12k stars 639 forks source link

Module for COVID-19 #679

Closed JLMoszkowicz closed 3 years ago

JLMoszkowicz commented 4 years ago

We're wondering how hard it would be to generate some mock data for patients that may have COVID-19. As I understand it, this would require a new module to be created using the Module Builder tool. Our team works on the AI side though; we aren't medical professionals. Is there any interest from project contributors who do have the necessary background to take an initial stab at creating this module?

jawalonoski commented 4 years ago

We don't think it would be very difficult, but there would be some limitations.

This idea was also mentioned here: https://chat.fhir.org/#narrow/stream/179160-social/topic/Pandemic.20Hackathon

My colleague @dehall had these thoughts:

  • we can model the clinical aspects: ~80% "mild symptoms", ~20% require medical attention, ~5% ICU, ~1% very intensive
  • we can set a specific start date for when things happen
  • we can't directly model social interactions and spread, though I'm not sure that matters
  • i don't think we have the capability in the module builder to do exponential growth. we can set a static single % chance of condition onset by day, or a static fixed number of %s, but we can't set a continuously growing % like we'd need for exponential growth
JLMoszkowicz commented 4 years ago

That sounds promising. I'm hearing that the number of people infected doubles about every week. Could you use that information to determine what the static % chance of condition onset would be for a series of weeks rather than letting it grow continuously?

jawalonoski commented 4 years ago

Yes, you could. You can use Dates in the conditional logic if you want specific weeks.

Number of infected doubles every 2 to 4 days depending on mitigation strategies. See https://medium.com/@tomaspueyo/coronavirus-act-today-or-people-will-die-f4d3d9cd99ca

awatson1978 commented 4 years ago

LOINC Codes - FHIR Observations https://chat.fhir.org/user_uploads/10155/tQQtv3GQZhC3DRmMgk59o7ly/ValueSet-covid-19-obs.json

SNOMED Codes - FHIR Conditions
https://confluence.ihtsdotools.org/display/snomed/SNOMED%2BCT%2BCoronavirus%2BContent

ICD-10 Codes https://www.cdc.gov/nchs/data/icd/ICD-10-CM-Official-Coding-Gudance-Interim-Advice-coronavirus-feb-20-2020.pdf

awatson1978 commented 4 years ago

A super minimal flow to get us started...

Pre-Hospitalization Flow

Hospitalization Workflow

Characteristics From the Coronavirus Disease 2019 (COVID-19) Outbreak in China
9 charts that explain the coronavirus pandemic

awatson1978 commented 4 years ago

Let's be careful about not letting the R0 and Dates functionality block the rest of the flow. If we start with a flat percentage of population who are infected and run the population retroactively, people can seed the flow with whichever severity of a pandemic outbreak they chose to model. There's both outpatient condition modeling and inpatient hospital flow that we want to model and generate records for; not just modeling pandemic spread.

jawalonoski commented 4 years ago

Working in progress, if anyone wants to take it from here...

covid19.json.txt

TODO:

The "Administer COVID-19 Test" diagnostic report needs work. I don't know what the values are supposed to look like.

The inpatient admission needs a lot of work, and probably a loop in there... right now that is being abstracted away by the 1 - 21 day "Stay" delay state.

raheelsayeed commented 4 years ago

Are you planning to simulate inpatient patient trajectories? Are the available aggregates good enough to emulate?

awatson1978 commented 4 years ago

I certainly think there would be benefit in doing so. Synthea can be used to generate both outpatient and inpatient populations, so I think there is benefit in both. The only limiting factor is the size of the workflow and understanding the model. Which is why COPD and Bronchitis are broken out into separate workflows, for example.

Having worked in an OR and being familiar with scrub protocols and being connected with Agiliti (formerly Universal Hospital Services, formerly ABC Oxygen Tent Rental Company), I'm a little worried that I may get pulled into reserve operations in an ICU or ward. So I've been brushing up on ventilator mechanics:

Merk Manual - Overview of Mechanical Ventilation

And while we don't need to model every step of the process, it might behoove us to include respiratory rate, arterial oxygen saturation, , and PaCO2, as conditions for being put onto a ventilator.

Similarly, I would tentatively recommend adding tocilizumab to the workflow with RxNorm code of 612865 to prevent cytokine storms.

RxNorm 612865 - tocilizumab

awatson1978 commented 4 years ago

For fever response, we want to use acetaminophen (RxNorm 161).

WHO Now Officially Recommends to Avoid Taking Ibuprofen For COVID-19 Symptoms

awatson1978 commented 4 years ago

The most distinctive comorbidities of 32 non-survivors from a group of 52 intensive care unit patients with novel coronavirus disease 2019 (COVID-19) in the study by Xiaobo Yang and colleagues1 were cerebrovascular diseases (22%) and diabetes (22%). Another study2 included 1099 patients with confirmed COVID-19, of whom 173 had severe disease with comorbidities of hypertension (23·7%), diabetes mellitus (16·2%), coronary heart diseases (5·8%), and cerebrovascular disease (2·3%). In a third study,3 of 140 patients who were admitted to hospital with COVID-19, 30% had hypertension and 12% had diabetes."

Are patients with hypertension and diabetes mellitus at increased risk for COVID-19 infection?

And we want to add the following ICD10 codes somehow.

awatson1978 commented 4 years ago

People with blood type A may be more vulnerable to coronavirus

SNOMED-CT - Blood group A (finding)

882-1 - ABO and Rh group [Type] in Blood LA21325-8 - A Pos LA21326-6 - A Neg

awatson1978 commented 4 years ago

Hydroxychloroquine and azithromycin as a treatment of COVID-19 Azithromycin induces anti-viral effects in cultured bronchial epithelial cells from COPD patients

RxNorm 5521 - Hydroxychloroquine RxNorm 1668237 - Azithromycin Injection

jawalonoski commented 4 years ago

I'll take a stab at updating the module today.

One reminder, Synthea does not support ICD10 codes... we use SNOMED, LOINC, and RxNorm.

awatson1978 commented 4 years ago

No ICD10???? How can that be? That's the oldest value set of them all! 400+ years old!

In all seriousness, consider this whole thread a feature request for ICD10 support. It's an essential coding system... much more so than DSM-V, HCPC, MESH, etc.

But we can do plenty with the SNOMED and LOINC codes in the meantime. And thank you for the help!

jawalonoski commented 4 years ago

ICD9 is public domain, so we could use that no trouble (with slight code change). We would have to look into ICD10 licensing from the WHO.

awatson1978 commented 4 years ago

Aaaaah. I see. They don't want people developing products with ICD10 without licensing it. As an interoperability product, can we create ICD9 and ICD10 containers as placeholders? Deliver the pipes, but not the content?

I'm going to try to track somebody down from WHO that can speak to this. I have a pretty good rolodex, and am going to see what I can do.

awatson1978 commented 4 years ago

Yeah, COVID19 response would certainly qualify as non-commercial research license. I suppose the problem is SyntheticHealth also offers for-profit support of Synthea? Hmmm.

http://apps.who.int/classifications/apps/icd/ClassificationDownloadNR/license.htm

jawalonoski commented 4 years ago

We don't offer for-profit support.

Anyway, let's take the ICD10 question out of band, or another issue, so this thread can stay focused on COVID19.

awatson1978 commented 4 years ago

Right. I'll log a separate issue for ICD10 next week sometime. In the meantime, I'm wrapping up a deliverable, and am hoping to run the generator and load results into a HAPI server this weekend.

jawalonoski commented 4 years ago

covid19.json.txt

Updated module. Needs testing and review. Lots (read: all) of observations and measurements are missing. But there are diagnosis, procedures, and medications.

jawalonoski commented 4 years ago

covid19.json.txt

Another update. Still needs more work.

jawalonoski commented 4 years ago

I created a branch where I'll keep posting updates. It still needs a lot of work. Missing labs and observations, missing pathways, and probabilities that are just flat out wrong.

awatson1978 commented 4 years ago

So I'm relocated back to the family farm, and have my prior obligations completed and published. I'm focusing on COVID19 the rest of the week.

Was just able to compile the COVID19 module, and so far so good. Getting a ca.uhn.fhir.model.dstu2.composite.QuantityDt error of some type. See below.

978 -- Nestor901 Yundt842 (6 y/o M) Chicago, Illinois 
980 -- Jackson413 Jacobi462 (7 y/o M) Chicago, Illinois 
976 -- Kaley842 Emmerich580 (34 y/o F) Chicago, Illinois 
975 -- Dudley365 Roob72 (49 y/o M) Chicago, Illinois 
979 -- Fritz267 Mann644 (26 y/o M) Chicago, Illinois 
981 -- Verónica383 Tovar84 (40 y/o F) Chicago, Illinois 
977 -- Antwan357 Doyle959 (51 y/o M) Chicago, Illinois 
983 -- Rosamaria757 Kirlin939 (21 y/o F) Chicago, Illinois 
982 -- Isobel140 Casper496 (46 y/o F) Chicago, Illinois 
985 -- Clement78 Hagenes547 (26 y/o M) Chicago, Illinois 
java.lang.ClassCastException: ca.uhn.fhir.model.dstu2.composite.QuantityDt cannot be cast to ca.uhn.fhir.model.dstu2.composite.SimpleQuantityDt
        at org.mitre.synthea.export.FhirDstu2.medicationAdministration(FhirDstu2.java:1119)
        at org.mitre.synthea.export.FhirDstu2.medication(FhirDstu2.java:1081)
        at org.mitre.synthea.export.FhirDstu2.convertToFHIR(FhirDstu2.java:206)
        at org.mitre.synthea.export.FhirDstu2.convertToFHIRJson(FhirDstu2.java:243)
        at org.mitre.synthea.export.Exporter.exportRecord(Exporter.java:95)
        at org.mitre.synthea.export.Exporter.export(Exporter.java:52)
        at org.mitre.synthea.engine.Generator.generatePerson(Generator.java:396)
        at org.mitre.synthea.engine.Generator.lambda$run$2(Generator.java:239)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
jawalonoski commented 4 years ago

Huh. I've only been trying FHIR R4 and CSV, so I haven't seen that yet.

awatson1978 commented 4 years ago

I'll be checking R4 tomorrow. Just reporting things as I find them. :)

I just managed to upload a generated dataset into HAPI, and am confirming the following queries. Looking great so far!

# DSTU2 - Search for all testing encounters "Encounter for symptom (procedure)" ???
http://localhost:3100/baseDstu2/Encounter?type=185345009

# DSTU2 - Search for all conditions with "Cough (finding)"
http://localhost:3100/baseDstu2/Condition?code=49727002

# DSTU2 - Search for all conditions with "Dyspnea (finding)"
http://localhost:3100/baseDstu2/Condition?code=267036007

# DSTU2 - Search for all conditions with "Fever (finding)"
http://localhost:3100/baseDstu2/Condition?code=386661006

# DSTU2 - Search for all conditions with "Suspected COVID-19"
http://localhost:3100/baseDstu2/Condition?code=840544004

# DSTU2 - Search for all conditions with "Suspected COVID-19" in the month of March 2020
http://localhost:3100/baseDstu2/Condition?code=840544004&onset=ge2020-03-01&onset=le2020-03-31

# DSTU2 - Search for all conditions with "COVID-19"
http://localhost:3100/baseDstu2/Condition?code=840539006

# DSTU2 - Search for all conditions with "Pneumonia"
http://localhost:3100/baseDstu2/Condition?code=233604007

# DSTU2 - Search for medication admininstrations of "Hydroxychloroquine Sulfate 200 MG Oral Tablet"
http://localhost:3100/baseDstu2/MedicationAdministration?code=979092

# DSTU2 - Search for all procedures of type "Oxygen administration by mask (procedure)"
http://localhost:3100/baseDstu2/Procedure?code=371908008

#==================================================

# R4 - Search for all testing encounters "Encounter for symptom (procedure)" ???
http://localhost:3100/baseR4/Encounter?type=185345009

# R4 - Search for all conditions with "Cough (finding)"
http://localhost:3100/baseR4/Condition?code=49727002

# R4 - Search for all conditions with "Dyspnea (finding)"
http://localhost:3100/baseR4/Condition?code=267036007

# R4 - Search for all conditions with "Fever (finding)"
http://localhost:3100/baseR4/Condition?code=386661006

# R4 - Search for all conditions with "Suspected COVID-19"
http://localhost:3100/baseR4/Condition?code=840544004

# R4 - Search for all conditions with "Suspected COVID-19" in the month of March 2020
http://localhost:3100/baseR4/Condition?code=840544004&onset-date=ge2020-03-01&onset-date=le2020-03-31

# R4 - Search for all conditions with "COVID-19"
http://localhost:3100/baseR4/Condition?code=840539006

# R4 - Search for all conditions with "Pneumonia"
http://localhost:3100/baseR4/Condition?code=233604007

# R4 - Search for medication admininstrations of "Hydroxychloroquine Sulfate 200 MG Oral Tablet"
http://localhost:3100/baseR4/MedicationAdministration?code=979092

# R4 - Search for all procedures of type "Oxygen administration by mask (procedure)"
http://localhost:3100/baseR4/Procedure?code=371908008

Update: Wasn't able to find any patients with "20 ML tocilizumab 20 MG/ML Injection". Looking through the raw files that were generated, it doesn't look like any MedicationAdministrations of tocilizumab were generated.

scivm commented 4 years ago

Different countries/states are testing different percentages of the population. Finland is only doing minimal testing for the most vulnerable. In the situation where a country or state does not do much testing, we see many J06.9 Acute upper respiratory infection, unspecified. There are 1056 suspected/positive test covid cases U07.1/U07.2 and 47,258 J06.9 (Snomed 35207929) in the last 60 days.

I would think the model could have some multiple ratio of U07.1 to J06.9 (Snomed 35207929) depending on percentage of population tested if that is possible. Weekends have lower diagnosis rates because of the slow onset of the disease. Here is something to show what I mean: https://share.geckoboard.com/dashboards/GFO7AU47LS4O4CJM/

scivm commented 4 years ago

@awatson1978 Is there some property setting to have the epidemic start in a particular month?

jeffeastman commented 4 years ago

Currently it’s baked into the module:

"states": { "Initial": { "type": "Initial", "direct_transition": "Wait Until Outbreak" }, "Terminal": { "type": "Terminal" }, "Wait Until Outbreak": { "type": "Guard", "allow": { "condition_type": "Date", "operator": "==", "year": 2020, "value": 0 }, "direct_transition": "Wait Until Exposure" },

On Mar 24, 2020, at 1:46 PM, Science VM notifications@github.com wrote:

@awatson1978 https://github.com/awatson1978 Is there some property setting to have the epidemic start in a particular month?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/synthetichealth/synthea/issues/679#issuecomment-603402662, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACB3NFAH46VIKWIAF3NTB4TRJDWX7ANCNFSM4LGNAX7A.

jawalonoski commented 4 years ago

The "Wait Until Exposure" state randomly waits 1 - 90 days (uniform distribution) for each patient before infecting them. You can modify it by changing the guard or the delay.

jawalonoski commented 4 years ago

I'm working on some changes now based on some NEJM and Lancet papers, and will try to post an update this afternoon.

Edit: posted an update. I'll be posting regular updates to the covid19 branch.

awatson1978 commented 4 years ago

Some updates and feedback:

jawalonoski commented 4 years ago

Hmm... not sure what you mean... unless something is wrong, Patient addresses should be generated on the Patient resource... Same thing for encounter types -- outpatient, inpatient, emergency, etc.... in the Encounter.class. e.g. "AMB" or "IMP" or "EMER"

scivm commented 4 years ago

@awatson1978 The address on the patient object is a fake generated address with the lat/lon of the postal code centroid. I wrote a python script to post process the patient csv file to put real addresses. For finland there is a open data of every residential address and you might find similar data for your state at openaddresses.io or government data. My script takes the zip code of the patient and randomly selects a real address from that file with same zip code and replaces it.

Script: https://github.com/science-automation/ETL-Synthea-Python/blob/master/python_etl/real_address_synthea.py

Data: https://dev.azure.com/shambergerm/SyntheticFinland/_git/fi_addresses

It is difficult making this functionality for synthea because the large sizes of the address databases.

scivm commented 4 years ago

Used the covid 19 data generator to make this video of a simulated epidemic in Finland capitol region. It uses kepler.gl open source from Uber. https://youtu.be/CawkBcHWMT4

awatson1978 commented 4 years ago

Probably another item with the uploader. Am investigating this evening. Thanks and apologies for the noise.

awatson1978 commented 4 years ago

Looks like it might have been something with the HAPI server; possibly being overly restrictive with validation and/or me mis-loading DSTU2 data into an R4 server. The new https://covid19-under-fhir.smilecdr.com/baseR4 seems to be receiving the data fine, including the Encounter classes and the Patient addresses.

Confirming the following, which has a Synthea dataset from the March 20th schema:

// R4 Encounters - Has class 
https://covid19-under-fhir.smilecdr.com/baseR4/Encounter?type=185345009

// R4 Patient - Has Address
https://covid19-under-fhir.smilecdr.com/baseR4/Patient/1b0580b9-1ee3-4353-b555-64c797d57564

Back to disease modeling....

awatson1978 commented 4 years ago

Found an excellent resource for Clinical Pathways from the University of Chicago Medicine.

jawalonoski commented 4 years ago

Found an excellent resource for Clinical Pathways from the University of Chicago Medicine.

It looks like the current module actually follows these guidelines pretty well. Some minor discrepancies (e.g. Tamiflu for negative testers, 48 hours between negative tests rather than 24), but overall, it looks like we're modeling these fairly accurately.

I still intend to make further changes related to progression of complications for the most severe and critical patients.

awatson1978 commented 4 years ago

Yeah, we're on the right track.

FYI... I've had a couple of requests today already for synthetic data that includes the serology immunity, post exposure.

{
  "code" : "94503-0",
  "display" : "SARS-CoV-2 IgG+IgM Pnl SerPl IA"
},
{
  "code" : "94505-5",
  "display" : "SARS-CoV-2 IgG SerPl IA-aCnc"
},
{
  "code" : "94506-3",
  "display" : "SARS-CoV-2 IgM SerPl IA-aCnc"
},
{
  "code" : "94504-8",
  "display" : "SARS-CoV-2 IgG+IgM Pnl SerPl IA-aCnc"
},
{
  "code" : "94508-9",
  "display" : "SARS-CoV-2 IgM SerPl Ql IA"
},
{
  "code" : "94509-7",
  "display" : "SARS-CoV-2 E gene Ct XXX Qn NAA+probe"
},
{
  "code" : "94507-1",
  "display" : "SARS-CoV-2 IgG SerPl Ql IA"
}

I think an outpatient encounter with lab work at the end of the pipeline would suffice.

jawalonoski commented 4 years ago

You get one or two 94531-1 : SARS-CoV-2 RNA Pnl Resp NAA+probe DiagnosticReports at the end, if the patient was admitted and survive long enough.

The request is for all the other codes too? OK... can make that happen.

Are those part of a panel?

awatson1978 commented 4 years ago

I suspect all the SerPl codes are part of a panel. The E gene is a different kind of test, I think.

I asked the person who requested it to chime in and comment.

awatson1978 commented 4 years ago

Found this graphic, and it has references. Looks pretty legit to me.

Progression

jawalonoski commented 4 years ago
image

I'm planning on modeling the following diagram, taken from https://doi.org/10.1016/S0140-6736(20)30566-3

awatson1978 commented 4 years ago

SNOMED Codes - Device (Types?) 706172005 | Ventilator (physical object) 257463002 | Ventilator outlet (physical object) (if counting slots)

awatson1978 commented 4 years ago

Beyond Chloroquine: AI helps identify possible treatments for COVID-19

awatson1978 commented 4 years ago

Found a nice summary of treatments, and did the lookup of the RxNorm codes.

Emerging Treatments

RxNorm 284756 - Kaletra https://lnkd.in/gKeNV8F

RxNorm 1600705 - Prezcobix https://lnkd.in/g_xSfvU

RxNorm 2683 - Colchicine https://lnkd.in/gS65vpK

RxNorm 1923334 - Kevzara https://lnkd.in/gGrh_pD

RxNorm 895761 - Actemra https://lnkd.in/g_sdX4X

RxNorm 337521 - Avastin https://lnkd.in/gzBYfBV

RxNorm 1012896 - Gilenya https://lnkd.in/gucWKbK

RxNorm 52175 - Losartan https://lnkd.in/g3UfHJ9

RxNorm 1535243 - Sylvant https://lnkd.in/g_-EFaW

RxNorm 2393 - Chloroquine https://lnkd.in/gh2mCPX

vpl-profess commented 4 years ago

Hi, I'm currently exploring this very rich and complex pathway, and searching from the place where covid19_death was set, I arrived to the determine_risk.json As the Negative/Middle and Severe_Severity SetAttribute boxes are not connected to anything is suspect that the module is working in progress ? Am I wrong ? If yes what was the intend to connect these boxes ?

Thanks very much for these clarifications We intend to use this very impressive work for characterising Pathway

Best Regards

Vpl

Tracy-Mc commented 4 years ago

Should the Compassionate Care Medication Order RX NORM - Tocilizumab Have an option arrow to End Symptoms Submodule - Covid19/End_symptoms Rather than 100% leading to Wait for End Death This seems a slight misplacement in the pathway?

jawalonoski commented 4 years ago

@vpl-profess said:

As the Negative/Middle and Severe_Severity SetAttribute boxes are not connected to anything is suspect that the module is working in progress ? Am I wrong ? If yes what was the intend to connect these boxes ?

The boxes are connected via the table based transition... there is just a bug in the module builder that does not connect the states. Sorry for the confusion. See the src/main/resources/modules/lookup_tables/covid*.csv files.