joezuntz / cosmosis

Other
22 stars 16 forks source link

campaigns: yaml read-in can cause unexpected behavior from accidental duplicate keys #114

Closed jessmuir closed 7 months ago

jessmuir commented 7 months ago

(Accidentally initially submitted this as a standard library issue, copying it here where it is more relevant!)

When setting up a new run in a campaign yaml file, I accidentally created a run entry that had the params settings split in two places, like this:

  - name: testlike
    parent: baseline
    params:
    - sampler = test
    - pipeline.fast_slow = F
    env:
      DATA_VECTOR : sim_alt_w.fits
    params:
    - DEFAULT.2PT_FILE = data_vectors/${DATA_VECTOR}
    values:
    - cosmological_parameters--w = -0.90

Here, the parent run was set up to use the polychord sampler, but I wanted to do a quick test run to make sure the likelihood matched a the simulated datavector I'd just made (sim_alt_w.fits). When I ran this using cosmosis-campaign, it started the the polychord sampler, which suggests that perhaps the second block of params: read-in may have overwritten the first one, or caused the first one where I set sampler = test to be ignored.

While this mistake in my yaml file was pretty obvious (causing me to launch polychord instead of a test sampler), I could imagine scenarios where this could cause problems if not caught!

I think this is just the default behavior of how yaml.safe_load() handles duplicate keys. I'm not sure what the best fix is, but if possible it'd be good to either have the yaml parser throw an error if you have a duplicate list name or adjust the read-in so it can handle cases where the params (or env or values or pipeline) lists are split into two parts.

joezuntz commented 7 months ago

Addressed in issue #117