synthetichealth / synthea

Synthetic Patient Population Simulator
https://synthetichealth.github.io/synthea
Apache License 2.0
2.18k stars 653 forks source link

Patient height and weight values extremely low #1439

Closed malbertson3 closed 7 months ago

malbertson3 commented 7 months ago

What happened?

I extracted patient age, height, and weight from my Synthea generated Observation.ndjson file. All of the heights are <60cm and all of the weights are <6kg. I'm using LOINC code 8302-2 for body height and code 29463-7 for body weight. I also printed out patient ages which range from about 10 - 70 years old. I'm using ./run_synthea -p 1000 to generate the data. Attached is a zip containing Observation.ndjson and Patient.ndjson files. Observation.zip

I only modified the synthea.properties file and Module.java. Is there something I need to tweak in my properties file to bring the heights and weights back up to normal range?

Thanks in advance.

Contents of my synthea.properties file:

exporter.baseDirectory = ./output/
exporter.use_uuid_filenames = false
exporter.subfolders_by_id_substring = false
# exporters that use XML or JSON can enable or disable 'pretty printing'
exporter.pretty_print = true
# number of years of history to keep in exported records, anything older than this may be filtered out
# set years_of_history = 0 to skip filtering altogether and keep the entire history
exporter.years_of_history = 3
# split records allows patients to have one record per provider organization
exporter.split_records = false
exporter.split_records.duplicate_data = false
exporter.metadata.export = true
exporter.ccda.export = false
exporter.fhir.export = true
exporter.fhir_stu3.export = false
exporter.fhir_dstu2.export = false
exporter.fhir.use_shr_extensions = false
exporter.fhir.use_us_core_ig = true
exporter.fhir.us_core_version = 5.0.1
exporter.fhir.transaction_bundle = true
# using bulk_data=true will ignore exporter.pretty_print
exporter.fhir.bulk_data = true
# included_ and excluded_resources list out the resource types to include/exclude in the csv exporters.
# only one of these may be set at a time, if both are set then both will be ignored.
# if neither is set, then all resource types will be included.
# note the Patient and Encounter resources will always be included, even if specifically listed as excluded here
exporter.fhir.included_resources =
exporter.fhir.excluded_resources =
exporter.groups.fhir.export = false
exporter.hospital.fhir.export = false
exporter.hospital.fhir_stu3.export = false
exporter.hospital.fhir_dstu2.export = false
exporter.practitioner.fhir.export = false
exporter.practitioner.fhir_stu3.export = false
exporter.practitioner.fhir_dstu2.export = false
exporter.encoding = UTF-8
exporter.json.export = false
exporter.json.include_module_history = false
exporter.csv.export = false
# if exporter.csv.append_mode = true, then each run will add new data to any existing CSVs. if false, each run will clear out the files and start fresh
exporter.csv.append_mode = true
# if exporter.csv.folder_per_run = true, then each run will have CSVs placed into a unique subfolder. if false, each run will only use the top-level csv folder
exporter.csv.folder_per_run = false
# included_files and excluded_files list out the files to include/exclude in the csv exporter
# only one of these may be set at a time, if both are set then both will be ignored
# if neither is set, then all files will be included
# see list of files at: https://github.com/synthetichealth/synthea/wiki/CSV-File-Data-Dictionary
# include filenames separated with a comma, ex: patients.csv,procedures.csv,medications.csv
# NOTE: the csv exporter does not actively delete files, so if Run 1 you included a file, then Run 2 you exclude that file, the version from Run 1 will still be present
exporter.csv.included_files =
exporter.csv.excluded_files = patient_expenses.csv

exporter.cpcds.export = false
exporter.cpcds.append_mode = false
exporter.cpcds.folder_per_run = false
exporter.cpcds.single_payer = false

exporter.bfd.export = false
exporter.bfd.require_code_maps = true
exporter.bfd.export_missing_codes = true
exporter.bfd.bene_id_start = -1000000
exporter.bfd.clm_id_start = -100000000
exporter.bfd.clm_grp_id_start = -100000000
exporter.bfd.pde_id_start = -100000000
exporter.bfd.fi_doc_cntl_num_start = -100000000
exporter.bfd.carr_clm_cntl_num_start = -100000000
exporter.bfd.mbi_start = 1S00-E00-AA00
exporter.bfd.hicn_start = T01000000A
exporter.bfd.partc_contract_start = Y0001
exporter.bfd.partc_contract_count = 10
exporter.bfd.plan_benefit_package_start = 800
exporter.bfd.plan_benefit_package_count = 5
exporter.bfd.partd_contract_start = Z0001
exporter.bfd.partd_contract_count = 10
exporter.bfd.clia_labs_start = 00A0000000
exporter.bfd.clia_labs_count = 10
exporter.bfd.cutoff_date=20140529

exporter.cdw.export = false
exporter.text.export = false
exporter.text.per_encounter_export = false
exporter.clinical_note.export = false

# parameters for symptoms export
exporter.symptoms.csv.export = false
# selection mode of conditions or symptom export: 0 = conditions according to  exporter.years_of_history. other values = all conditions (entire history)
exporter.symptoms.mode = 0
# if exporter.symptoms.csv.append_mode = true, then each run will add new data to any existing CSVs. if false, each run will clear out the files and start fresh
exporter.symptoms.csv.append_mode = false
# if exporter.symptoms.csv.folder_per_run = true, then each run will have CSVs placed into a unique subfolder. if false, each run will only use the top-level csv folder
exporter.symptoms.csv.folder_per_run = false
exporter.symptoms.text.export = false

# enable searching for custom exporter implementations
exporter.enable_custom_exporters = true

# the number of patients to generate, by default
# this can be overridden by passing a different value to the Generator constructor
generate.default_population = 1

# the number of threads to use for the generator, set the value to -1 to match the number of
# available processors (as per Runtime.getRuntime().availableProcessors())
# defaults to -1 if not specified
generate.thread_pool_size = -1

generate.log_patients.detail = simple
# options are "none", "simple", or "detailed" (without quotes). defaults to simple if another value is used
# none = print nothing to the console during generation
# simple = print patient names once they are generated.
# detailed = print patient names, atributes, vital signs, etc..  May slow down processing

generate.timestep = 604800000
# time is in ms
# 1000 * 60 * 60 * 24 * 7 = 604800000

# default demographics is every city in the US
generate.demographics.default_file = geography/demographics.csv
generate.geography.zipcodes.default_file = geography/zipcodes.csv
generate.geography.country_code = US
generate.geography.timezones.default_file = geography/timezones.csv
generate.geography.foreign.birthplace.default_file = geography/foreign_birthplace.json
generate.geography.sdoh.default_file = geography/sdoh.csv

# Lookup Table Folder location
generate.lookup_tables = modules/lookup_tables/

# Set to true if you want every patient to be dead.
generate.only_dead_patients = false
# Set to true if you want every patient to be alive.
generate.only_alive_patients = false
# If both only_dead_patients and only_alive_patients are set to true,
# It they will both default back to false

# if criteria are provided, (for example, only_dead_patients, only_alive_patients, or a "patient keep module" with -k flag)
# this is the maximum number of times synthea will loop over a single slot attempting to produce a matching patient.
# after this many failed attempts, it will throw an exception.
# set this to 0 to allow for unlimited attempts (but be aware of the possibility that it will never complete!)
generate.max_attempts_to_keep_patient = 1000

# if true, tracks and prints out details of transition tables for each module upon completion
# note that this may significantly slow down processing, and is intended primarily for debugging
generate.track_detailed_transition_metrics = false

# If true, person names have numbers appended to them to make them more obviously fake
generate.append_numbers_to_person_names = false

# Probability of each person having a middle name. 0 is zero, 1.0 is 100% chance.
generate.middle_names = 0.80

# if true, the entire population will use veteran prevalence data
generate.veteran_population_override = false

# these should add up to 1.0
# weighting and categories are inspired by the following but there are no specific hard numbers to point to
# http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1694190/pdf/amjph00543-0042.pdf
# http://www.ncbi.nlm.nih.gov/pubmed/8122813
generate.demographics.socioeconomic.weights.income = 0.2
generate.demographics.socioeconomic.weights.education = 0.7
generate.demographics.socioeconomic.weights.occupation = 0.1

generate.demographics.socioeconomic.score.low = 0.0
generate.demographics.socioeconomic.score.middle = 0.25
generate.demographics.socioeconomic.score.high = 0.66

generate.demographics.socioeconomic.education.less_than_hs.min = 0.0
generate.demographics.socioeconomic.education.less_than_hs.max = 0.5
generate.demographics.socioeconomic.education.hs_degree.min = 0.1
generate.demographics.socioeconomic.education.hs_degree.max = 0.75
generate.demographics.socioeconomic.education.some_college.min = 0.3
generate.demographics.socioeconomic.education.some_college.max = 0.85
generate.demographics.socioeconomic.education.bs_degree.min = 0.5
generate.demographics.socioeconomic.education.bs_degree.max = 1.0

# The average family size in the US is 3.13. The 2010 FPL for a 3-person household is $18310. Tuned it to $17550 for realistic medicaid/ACA enrollments.
generate.demographics.socioeconomic.income.poverty = 17550
generate.demographics.socioeconomic.income.high = 75000

generate.birthweights.default_file = birthweights.csv
generate.birthweights.logging = false

# in Massachusetts, the individual insurance mandate became law in 2006
# in the US, the Affordable Care Act become law in 2010,
# and individual and employer mandates took effect in 2014.
# mandate.year will determine when individuals with an occupation score above mandate.occupation
# receive employer mandated insurance (aka "private" insurance).
# prior to mandate.year, anyone with income greater than the annual cost of an insurance plan
# will purchase the insurance.
generate.insurance.mandate.year = 2006
generate.insurance.mandate.occupation = 0.2

# Defines what percent of insurance premiums are covered by employers, when employer-covered.
# According to [https://www.kff.org/report-section/ehbs-2021-summary-of-findings/],
# the average employee premium contribution is 0.17 and employers pay 0.83.
generate.insurance.employer_coverage = 0.83

# Default Costs, to be used for pricing something that we don't have a specific price for
# -- $500 for procedures is completely invented
generate.costs.default_procedure_cost = 500.00
# -- $255 for medications - also invented
generate.costs.default_medication_cost = 255.00
# -- Encounters billed using avg prices from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3096340/
# -- Adjustments for initial or subsequent hospital visit and level/complexity/time of encounter
# -- not included. Assume initial, low complexity encounter (Tables 4 & 6)
generate.costs.default_encounter_cost = 125.00
# -- https://www.nytimes.com/2014/07/03/health/Vaccine-Costs-Soaring-Paying-Till-It-Hurts.html
# -- currently all vaccines cost $136.
generate.costs.default_immunization_cost = 136.00
generate.costs.default_lab_cost = 100.00
# -- assumes device costs are included in procedure cost, if not add to costs/devices.csv
generate.costs.default_device_cost = 0.00
# -- assumes supply costs are included in procedure cost, if not add to costs/supplies.csv
generate.costs.default_supply_cost = 0.00

# Providers
generate.providers.hospitals.default_file = providers/hospitals.csv
generate.providers.longterm.default_file = providers/longterm.csv
generate.providers.nursing.default_file = providers/nursing.csv
generate.providers.rehab.default_file = providers/rehab.csv
generate.providers.hospice.default_file = providers/hospice.csv
generate.providers.dialysis.default_file = providers/dialysis.csv
generate.providers.homehealth.default_file = providers/home_health_agencies.csv
generate.providers.veterans.default_file = providers/va_facilities.csv
generate.providers.urgentcare.default_file = providers/urgent_care_facilities.csv
generate.providers.primarycare.default_file = providers/primary_care_facilities.csv
generate.providers.ihs.hospitals.default_file = providers/ihs_facilities.csv
generate.providers.ihs.primarycare.default_file = providers/ihs_centers.csv

# Provider selection behavior
# How patients select a provider organization:
#  nearest - select the closest provider. See generate.providers.maximum_search_distance
#  random  - select randomly.
#  network - select a random provider in your insurance network. same as random except it changes every time the patient switches insurance provider.
#  medicare - select the nearest provider that can bill Medicare. If no Medicare provider is found, it defaults back to "nearest".
generate.providers.selection_behavior = nearest

# if a provider cannot be found for a certain type of service,
# this will default to the nearest hospital.
generate.providers.default_to_hospital_on_failure = true

# minimum number of providers linked per patient
# if this number is not met it re-runs the simulation
generate.providers.minimum = 1

# maximum distance to look for a provider for a given patient, in km
# set to 10 degrees lat/lon to support the model that veterans only seek care at VA facilities
generate.providers.maximum_search_distance = 1000

# Payers
generate.payers.insurance_companies.default_file = payers/insurance_companies.csv
generate.payers.insurance_plans.default_file = payers/insurance_plans.csv
generate.payers.insurance_plans.eligibilities_file = payers/insurance_eligibilities.csv
generate.payers.insurance_companies.medicare = Medicare
generate.payers.insurance_companies.medicaid = Medicaid
generate.payers.insurance_companies.dual_eligible = Dual Eligible
# The percentage of a person's income that they are willing to spend on health insurance premiums.
generate.payers.insurance_plans.income_premium_ratio = 0.034
# The chance of rejection
# Plan selection behavior
# How patients select a plan:
#  best_rates - select plans with best rates for person's existing conditions and medical needs
#  random  - select plans randomly.
#  priority  - select plans based on the priority level defined in the insurance plans file.
generate.payers.selection_behavior = priority

# Payer adjustment behavior
# How payers adjust claims:
#  none - the payer reimburses each claim by the full amount.
#  fixed - the payer adjusts each claim by a fixed rate (set by adjustment_rate)
#  random  - the payer adjusts each claim by a random rate (between zero and adjustment_rate).
generate.payers.adjustment_behavior = none
# Payer adjustment rate should be between zero and one (0.00 - 1.00), where 0.05 is 5%.
generate.payers.adjustment_rate = 0.10

# Experimental feature. Patients will miss care if true, but side-effects of missing that care
# are not handled. Additionally, the path the disease module might take may no longer make sense.
# It might assume things occurred that haven't actually happened it. Use with care.
generate.payers.loss_of_care = false

# Add a FHIR terminology service URL to enable the use of ValueSet URIs within code definitions.
# generate.terminology_service_url = https://r4.ontoserver.csiro.au/fhir

# Quit Smoking
lifecycle.quit_smoking.baseline = 0.01
lifecycle.quit_smoking.timestep_delta = -0.01
lifecycle.quit_smoking.smoking_duration_factor_per_year = 1.0

# Quit Alcoholism
lifecycle.quit_alcoholism.baseline = 0.001
lifecycle.quit_alcoholism.timestep_delta = -0.001
lifecycle.quit_alcoholism.alcoholism_duration_factor_per_year = 1.0

# Adherence
lifecycle.adherence.baseline = 0.05

# set this to true to enable randomized "death by natural causes"
# highly recommended if "only_dead_patients" is true
lifecycle.death_by_natural_causes = false

# set this to enable "death by loss of care" or missed care,
# e.g. not covered by insurance or otherwise unaffordable.
# only functional if "generate.payers.loss_of_care" is also true.
lifecycle.death_by_loss_of_care = false

# Use physiology simulations to generate some VitalSigns
physiology.generators.enabled = false

# Allow physiology module states to be executed
# If false, all Physiology state objects will immediately redirect to the state defined in
# the alt_direct_transition field
physiology.state.enabled = false

# set to true to introduce errors in height, weight and BMI observations for people
# under 20 years old
growtherrors = false

I also, commented out these modules in lines 71-76 in synthea\src\main\java\org\mitre\synthea\engine\Module.java

    //retVal.put("Lifecycle", new ModuleSupplier(new LifecycleModule()));
    //retVal.put("Health Insurance", new ModuleSupplier(new HealthInsuranceModule()));
    retVal.put("Cardiovascular Disease", new ModuleSupplier(new CardiovascularDiseaseModule()));
    retVal.put("Quality Of Life", new ModuleSupplier(new QualityOfLifeModule()));
    //retVal.put("Weight Loss", new ModuleSupplier(new WeightLossModule()));
    //retVal.put("COVID-19 Immunization Module", new ModuleSupplier(new C19ImmunizationModule()));

Environment

- OS:
- Java:

Relevant log output

No response

dehall commented 7 months ago

In general I would recommend not commenting out any of the java modules because things can be very intertwined (I'm actually a little surprised nothing crashed) but to address the specific case here, growth happens in the LifecycleModule. Uncomment that one and patients should get better heights and weights.

jawalonoski commented 7 months ago

Because you commented out the LifecycleModule, not only are they not growing, they aren't even aging. You have basically a population of unaging babies.

malbertson3 commented 7 months ago

Thank you. That fixed it. I knew I was doing something stupid!