starsimhub / starsim

Starsim disease modeling framework
http://starsim.org
MIT License
14 stars 8 forks source link

Can't use a dist in more than one state anymore - 'define_states' #692

Open menriquez-IDM opened 4 days ago

menriquez-IDM commented 4 days ago

(Context: I am porting tbsim to starsim 2.0 ).... in our Malnutrition Disease Module, as part of the constructor we are using distributions as default values, which used to work fine, but now are failing:

    def __init__(self, pars=None, **kwargs):
        super().__init__(**kwargs)
        self.define_pars(
            beta = 1.0,         # Transmission rate  
            init_prev = 0.001,  # Initial prevalence 
        )
        self.update_pars(pars, **kwargs)

        anthro_path = os.path.join(DATADIR, 'anthropometry.csv')
        self.LMS_data = pd.read_csv(anthro_path).set_index('Sex')

        # Adding Malnutrition states to handle the Individual Properties related to this disease 
        self.define_states(
            # Hooks to the RATIONS trial
            ss.BoolArr('receiving_macro', default=False), # Determines weight trend
            ss.BoolArr('receiving_micro', default=False), # Determines micro trend

            "----->>>>>>>>>>>>>>>>>>>>>> ISSUE TRIGGERED BY DEFAULT VALUES BELOW:
            ss.FloatArr('height_percentile', default=ss.uniform()), # Percentile, stays fixed
            ss.FloatArr('weight_percentile', default=ss.uniform()), # Percentile, increases when receiving micro, then declines?
            ss.FloatArr('micro', default=ss.uniform()), # Continuous? Normal distribution around zero. Z-score, sigmoid thing. Half-life.

        )
        self.dweight = ss.normal(loc=self.dweight_loc, scale=self.dweight_scale)
        return

The error is as follows:

(venvVer2) c:\git\tbsim\scripts\general>python run_malnutrition.py
Starsim 2.0.0 (2024-10-01) — © 2023-2024 by IDM
Initializing sim with 10000 agents
Traceback (most recent call last):
  File "c:\git\tbsim\scripts\general\run_malnutrition.py", line 32, in <module>
    sim_n.run()
  File "c:\git\tbsim\venvVer2\Lib\site-packages\starsim\sim.py", line 298, in run
    if not self.initialized: self.init()
                             ^^^^^^^^^^^
  File "c:\git\tbsim\venvVer2\Lib\site-packages\starsim\sim.py", line 161, in init
    self.init_dists() # Initialize distributions
    ^^^^^^^^^^^^^^^^^
  File "c:\git\tbsim\venvVer2\Lib\site-packages\starsim\sim.py", line 214, in init_dists
    self.dists.init(obj=self, base_seed=self.pars.rand_seed, force=True)
  File "c:\git\tbsim\venvVer2\Lib\site-packages\starsim\distributions.py", line 97, in init
    self.check_seeds()
  File "c:\git\tbsim\venvVer2\Lib\site-packages\starsim\distributions.py", line 107, in check_seeds
    raise DistSeedRepeatError(checked[seed], dist)
starsim.distributions.DistSeedRepeatError: A common seed was found between ss.uniform(people_states_malnutrition.height_percentile_default, pars={'low': 0.0, 'high': 1.0}) and ss.uniform(people_states_malnutrition.height_perc
entile_default_module_micro_default, pars={'low': 0.0, 'high': 1.0}). This is likely caused by incorrect initialization of the parent Dists object.
menriquez-IDM commented 3 days ago

this seems like there’s a race condition happening during the setup - the parent Dists object isn’t initialized properly, possibly due to concurrency problems, however this is a very simple script (not intentionally using parallelization)

menriquez-IDM commented 3 days ago

Branch and Disease module: Malnutrition.py init method

cliffckerr commented 2 days ago

Found the issue -- two different distributions are being assigned the same trace value, because the last 6 digits are the same:

s1 = 'people_states_malnutrition.height_percentile_default_module_micro_default'

s2 = 'people_states_malnutrition.height_percentile_default'

int.from_bytes(s1.encode(), byteorder='big')
Out  [25]: 27799005104623711454959060709994609883093710570832320243914707190666620671010000743368678782368611280387512301456005666920680284384960831028951297485891059307079559274961398900

int.from_bytes(s2.encode(), byteorder='big')
Out  [26]: 74300199819311061043722186592658515386210291499026427434428867777338769916261806379905559986374403707116839290598318545398900

Will update to using the hash method for making distribution strings.

daniel-klein commented 2 days ago

Easy workaround seems to be to give the distributions names as follows:

            ss.FloatArr('height_percentile', default=ss.uniform(name='height_percentile')), # Percentile, stays fixed
            ss.FloatArr('weight_percentile', default=ss.uniform(name='weight_percentile')), # Percentile, increases when receiving micro, then declines?
            ss.FloatArr('micro', default=ss.uniform(name='micro')), # Continuous? Normal distribution around zero. Z-score, sigmoid thing. Half-life.
daniel-klein commented 2 days ago

Also noticing here that traces get a bit interesting due to distributions having a pointer to the parent module. Perhaps not wrong, but also could be more right?

Example:

'people_states_malnutrition.height_percentile_default' =
ss.uniform(people_states_malnutrition.height_percentile_default, pars={'low': 0.0, 'high': 1.0})
'people_states_malnutrition.height_percentile_default_module_weight_percentile_default' =
ss.uniform(<no trace>, pars={'low': 0.0, 'high': 1.0})
'people_states_malnutrition.height_percentile_default_module_micro_default' =
ss.uniform(<no trace>, pars={'low': 0.0, 'high': 1.0})
daniel-klein commented 2 days ago

Distribution traces are weird! Running with TB and malnutrition, the traces appear as follows:

'pars_pregnancy_dur_postpartum' =
ss.lognorm_ex(pars_pregnancy_dur_postpartum, pars={'mean': ss.years(0.5, unit=year, values=26.07142857142857), 'std': ss.years(0.5, unit=year, values=26.07142857142857)})
'pars_pregnancy_dur_postpartum_module_fecund_people_states_female_default' =
ss.bernoulli(pars_pregnancy_dur_postpartum_module_fecund_people_states_female_default, pars={'p': 0.5})
'pars_pregnancy_dur_postpartum_module_fecund_people_states_age_default' =
ss.uniform(pars_pregnancy_dur_postpartum_module_fecund_people_states_age_default, pars={'low': 0, 'high': 100})
'pars_pregnancy_dur_postpartum_module_fecund_people_states_SES_default' =
ss.bernoulli(pars_pregnancy_dur_postpartum_module_fecund_people_states_SES_default, pars={'p': 0.3})
'pars_pregnancy_dur_postpartum_module_fecund_people_states_malnutrition.height_percentile_default' =
ss.uniform(pars_pregnancy_dur_postpartum_module_fecund_people_states_malnutrition.height_percentile_default, pars={'low': 0.0, 'high': 1.0})
'pars_pregnancy_dur_postpartum_module_fecund_people_states_malnutrition.height_percentile_default_module_weight_percentile_default' =
ss.uniform(pars_pregnancy_dur_postpartum_module_fecund_people_states_malnutrition.height_percentile_default_module_weight_percentile_default, pars={'low': 0.0, 'high': 1.0})
'pars_pregnancy_dur_postpartum_module_fecund_people_states_malnutrition.height_percentile_default_module_micro_default' =
ss.uniform(pars_pregnancy_dur_postpartum_module_fecund_people_states_malnutrition.height_percentile_default_module_micro_default, pars={'low': 0.0, 'high': 1.0})
'pars_pregnancy_dur_postpartum_module_fecund_people_states_malnutrition.height_percentile_default_module_dweight' =
ss.normal(pars_pregnancy_dur_postpartum_module_fecund_people_states_malnutrition.height_percentile_default_module_dweight, pars={'loc': <function Malnutrition.dweight_loc at 0x16b8685e0>, 'scale': <function Malnutrition.dweight_scale at 0x16b868670>})

For whatever reason, these traces do not result in repeated seed errors, whereas running as above with only TB does result in repeats.

daniel-klein commented 1 day ago

Realizing now that the difference in paths is potentially a very serious issue for CRN. Each distribution has to get the exact same path in two comparison sims or distributions will get different seeds - so no CRN!

cliffckerr commented 1 day ago

I think it's fine -- if two sims have different structures, then I think it makes sense for those sims to also have different random numbers. Of course, you can override the default trace by providing an explicit name to the distribution.

daniel-klein commented 9 hours ago

I have a nice fix for this, I think... coming soon.