gwastro / pycbc

Core package to analyze gravitational-wave data, find signals, and study their parameters. This package was used in the first direct detection of gravitational waves (GW150914), and is used in the ongoing analysis of LIGO/Virgo data.
http://pycbc.org
GNU General Public License v3.0
307 stars 344 forks source link

Workflow-supplied options to be checked if already given in the config #4773

Closed GarethCabournDavies closed 2 weeks ago

GarethCabournDavies commented 1 month ago

Options given in the config and then from the workflow generator could clash. This is now being checked and an error is raised.

Some plumbing has gone in to the option-adding parts of this work to allow an option to be given multiple times, e.g. --statistic-files, for which some of the options come from the config, others from the workflow. I think this is the only example of that use case

Standard information about the request

This is a bug fix This change affects workflow generation, so all areas the live search.

This change follows style guidelines (See e.g. PEP8), has been proposed using the contribution guidelines

Motivation

Supplying the option in both the config and from the workflow generator will have unknown/unforeseen outcomes.

Contents

Links to any issues or associated PRs

1418

Testing performed

I ran the pycbc_make_offline_search_workflow code with gps-start-time given in the config overrides. This caused a failure. The workflow generator completed otherwise

GarethCabournDavies commented 1 month ago

I will make the change to put it into pegasus_workflow.

I'll profile the workflow generation, but my thoughts on this are:

spxiwh commented 2 weeks ago

Did you profile this as discussed for a "production scale" offline workflow?

GarethCabournDavies commented 2 weeks ago

Looking at the profile for an o4 chunk, both with this change: https://ldas-jobs.ligo.caltech.edu/~gareth.cabourndavies/testoutput/various_tests/workflow_config_clash/offline_wflow_o4_chunk1.png

and without it: https://ldas-jobs.ligo.caltech.edu/~gareth.cabourndavies/testoutput/various_tests/workflow_config_clash/offline_wflow_o4_chunk1_v242.png

We can see that the usage of add_input_opt and add_input_list_opt are practically unchanged

I think this is as add_input_list_opt doesn't run the comparison for the inputs. So though there is a [if x in list] check for each value, it isn't run for the input files, which would cause a problem for e.g. the hdf_trigger_merge