joezuntz / cosmosis-standard-library

11 stars 31 forks source link

campaigns: yaml read-in can cause unexpected behavior from accidental duplicate keys #120

Closed jessmuir closed 9 months ago

jessmuir commented 9 months ago

When setting up a new run in a campaign yaml file, I accidentally created a run entry that had the params settings split in two places, like this:

  - name: testlike
    parent: baseline
    params:
    - sampler = test
    - pipeline.fast_slow = F
    env:
      DATA_VECTOR : sim_alt_w.fits
    params:
    - DEFAULT.2PT_FILE = data_vectors/${DATA_VECTOR}
    values:
    - cosmological_parameters--w = -0.90

Here, the parent run was set up to use the polychord sampler, but I wanted to do a quick test run to make sure the likelihood matched a the simulated datavector I'd just made (sim_alt_w.fits). When I ran this using cosmosis-campaign, it started the the polychord sampler, which suggests that perhaps the second block of params: read-in may have overwritten the first one, or caused the first one where I set sampler = test to be ignored.

While this mistake in my yaml file was pretty obvious (causing me to launch polychord instead of a test sampler), I could imagine scenarios where this could cause problems if not caught!

I think this is just the default behavior of how yaml.safe_load() handles duplicate keys. I'm not sure what the best fix is, but if possible it'd be good to either have the yaml parser throw an error if you have a duplicate list name or adjust the read-in so it can handle cases where the params (or env or values or pipeline) lists are split into two parts.

jessmuir commented 9 months ago

Meant to submit this as an issue for cosmosis, not csl. Closing this issue and opening it over on the main cosmosis repo.