Open yaniv256 opened 4 years ago
Wear a mask, wash your hands, and don't go out to eat at a restaurant.
The module is not malfunctioning. It is possible the infection rates are too high. On the other hand, most people in Synthea are getting tested, which is not happening in the real-world, so the results might be skewed. Also, not all of those people are ending up being admitted to the hospital or ICU. As we learn more about the actual results of the pandemic, we will go back and modify the infection rates.
If you want to modify the infection rates, feel free to edit this CSV file: https://github.com/synthetichealth/synthea/blob/master/src/main/resources/modules/lookup_tables/covid19_prob.csv
The time
column is a range, where each value is time in milliseconds since January 1, 1970 (i.e. standard Java timestamp).
I'm running synthea on Google Colab (upstream from some deep learning) so I can't really build it every time. Any chance to fix the jar file to produce something more realistic?
I don't know anything about Google Colab... it looks similar to a hosted version of Jupyter notebooks... but even so, I have no idea of how these suggestions will work or if they are possible in your environment:
If you have access to the JAR, you can just replace the covid19_prob.csv
file inside the Synthea JAR, either locally or potentially using the jar uf
command (see https://docs.oracle.com/javase/tutorial/deployment/jar/update.html).
You could also try using a local set of lookup_tables using the --generate.lookup_tables
command-line switch. You'll need to provide ALL the lookup tables though or you'll see a lot of exceptions.
# Lookup Table Folder location
generate.lookup_tables = modules/lookup_tables/
Any chance to fix the jar file to produce something more realistic?
Define realistic. If you'd like to provide different infection statistics, with the proper peer-reviewed citations, we're happy to take a pull request and update the table.
I assume that the COVID-19 dataset consists of 10K patients who tested positive using nasal swap testing, etc. I see that roughly 6000 had antibody testing (SARS-CoV-2 RNA) with many converting from positive to negative. On further thought, I can't use the dataset to predict who developed COVID versus those who did not. Great dataset nevertheless and could be used for descriptive statistics.
I don't know anything about Google Colab... it looks similar to a hosted version of Jupyter notebooks... but even so, I have no idea of how these suggestions will work or if they are possible in your environment:
- If you have access to the JAR, you can just replace the
covid19_prob.csv
file inside the Synthea JAR, either locally or potentially using thejar uf
command (see https://docs.oracle.com/javase/tutorial/deployment/jar/update.html).- You could also try using a local set of lookup_tables using the
--generate.lookup_tables
command-line switch. You'll need to provide ALL the lookup tables though or you'll see a lot of exceptions.# Lookup Table Folder location generate.lookup_tables = modules/lookup_tables/
- If you are using a hosted version of Synthea that you do not have access to, then there is nothing you can really do.
Any chance to fix the jar file to produce something more realistic?
Define realistic. If you'd like to provide different infection statistics, with the proper peer-reviewed citations, we're happy to take a pull request and update the table.
Exactly what kind of statistics are you looking for? I would be happy to do some research, I'm sure this data is available now. It would be very helpful if the output from this module was more accurate.
This table models the probability of infection over time. We could also add another column, such as State, to represent different geographies.
Also, it's difficult to use the module for developing apps if Covid infection rates are hard-coded at a low number. Gotta find the needles in the haystack. But having an adjustable input at the beginning of the pipeline would be nice. Sometimes we want to generate a population average, sometimes we want to generate a positive cohort, sometimes we want to generate a single patient and model disease progression.
It depends on what your definition of "hard-coded" is. It isn't compiled into the code, it is listed in a configuration file. See the link in the previous comment on October 9th.
I'm running
java -jar synthea-with-dependencies.jar -c synthea.properties -p 100 Minnesota
and I'm getting 77% of patients with aSuspected COVID-19
and 75% of patients with aCOVID-19
condition. Maybe we're missing some probability parameter?