gwastro / pycbc

Core package to analyze gravitational-wave data, find signals, and study their parameters. This package was used in the first direct detection of gravitational waves (GW150914), and is used in the ongoing analysis of LIGO/Virgo data.
http://pycbc.org
GNU General Public License v3.0
313 stars 347 forks source link

Problems with OSG running instructions #758

Closed duncan-brown closed 8 years ago

duncan-brown commented 8 years ago
  1. Remove this prerequisite as it is no longer needed:
An updated version of Java SSL proxies. Replace share/pegasus/java/ssl-proxies-2.1.0.jar with the one from http://gaul.isi.edu/pub/ssl-proxies-2.1.1-SNAPSHOT.jar
  1. Explain how to make the mixed executables file needed for OSG:
The bundled executables available on the submit machine

So far, @lppekows has been hand-building an mixed executables ini file that points to some of the executables from the code.pycbc.phy.syr.edu server and some from his home directory. See e.g.

/home/lppekows/projects/osg/config/mixed_executables.ini

This has the code that runs on OSG downloadable from the server (at the moment this is just pycbc_inspiral since that's the only job we run on OSG) and all the other codes coming from ${which:pycbc_average_psd}. This also relies on the user making sure that PATH is set correctly before planning the workflow. I mounted the software repo as /opt/pycbc-software so it should be possible to set up an ini file that has e.g.

average_psd = /opt/pycbc-software/v1.3.7/x86_64/composer_xe_2015.0.090/pycbc_average_psd
inspiral = http://code.pycbc.phy.syr.edu/pycbc-software/v1.3.7/x86_64/composer_xe_2015.0.090/pycbc_inspiral

Really, this means adding an extra step to the build bundle instructions to create an second executables.ini file called osg_executables.ini for each release. This also needs to contain the lines

[pegasus_profile-inspiral]
pycbc|installed = False
hints|execution.site = local

to tell pegasus that pycbc_inspiral is not installed locally.

If this is done, the instructions can just say to point to osg_executables.ini from the http server for a given release, and the instructions about setting the PATH can be delete.

  1. The instructions for creating a frame cache file are not robust. The cache that @stevereyes01 is using has
112596/H-H1_HOFT_C02-1125969920-4096.gwf root://srm.unl.edu//user/ligo/frames/ER8/H1_HOFT_C02/H/1125/H-H1_HOFT_C02-1125969920-4096.gwf pool="osg"
112596/H-H1_HOFT_C02-1125969920-4096.gwf file:///scratch/02750/stuart/frames/ER8/hoft_C02/H1/H-H1_HOFT_C02-11259/H-H1_HOFT_C02-1125969920-4096.gwf pool="osg"

This is an XrootD URL and a file URL that is valid on Stampede.

Since these paths don't actually change, I think it makes sense to just check the C02 cache file into GitLab. Then change the instruction to tell the user to download the cache file from GitLab and use it instead.

  1. Delete the text about
If a custom executables.ini is being used it will also be necessary to mark pycbc-inspiral as uninstalled by also adding 'pegasus_profile-inspiral:pycbc|installed:False' to the list of --config-overrides

as that is taken care of by using the new osg_executables.ini. However, make sure the build instructions say to add the correct lines to the ini file.

  1. Delete the option
--append-site-profile 'local:dagman|maxidle:5000' \

because this should be controlled by the admin of the submitting site in the condor configuration.

  1. Change the instructions from
--cache xroot-frames-c00.cache \

to say to use the cache file downloaded from GitLab.

  1. Change
Cache files for c01 and c02 are/will be available.

to contain URLs for the cache files.

  1. Make the comment about hostname -f a note and explain that this is setting --remote-staging-server.
  2. Remove the option
--append-pegasus-property 'pegasus.data.configuration=nonsharedfs'

as that is now set by default in https://github.com/ligo-cbc/pycbc/blob/master/bin/pycbc_submit_dax

  1. Remove
  --append-pegasus-property 'pegasus.catalog.replica.cache.asrc=true' \
  --append-pegasus-property 'pegasus.catalog.replica.dax.asrc=true' \

as they are now set by default in https://github.com/ligo-cbc/pycbc/blob/master/pycbc/workflow/pegasus_files/pegasus-properties.conf

  1. Add the instructions
  --append-pegasus-property 'pegasus.selector.replica Regex' \
  --append-pegasus-property 'pegasus.selector.replica.regex.rank.1 file:///scratch.*'  \
  --append-pegasus-property 'pegasus.selector.replica.regex.rank.2 root://srm.unl.edu.*' \
  --append-pegasus-property 'pegasus.selector.replica.regex.rank.3 .\*' \

to set up the PFN fall-back mechanism until issue https://github.com/ligo-cbc/pycbc/issues/756 is fixed.

  1. Since we need to perl the workflow to work around some lingering Pegasus issues, make sure that the instructions say to do
--no-submit

when doing pycbc_submit_dax

  1. Add an instruction saying to add
<profile namespace="condor" key="+DESIRED_XSEDE_SITES">&quot;Stampede,Comet&quot;</profile>

to the OSG site in output/site-catalog.xml until https://github.com/ligo-cbc/pycbc/issues/757 is fixed.

duncan-brown commented 8 years ago

@stevereyes01 please can you check that the instructions are up to date? I think all the relevant changes above have been made.

stevereyes01 commented 8 years ago

The instructions at: http://ligo-cbc.github.io/pycbc/latest/html/workflow/pycbc_make_coinc_search_workflow.html#running-on-the-open-science-grid

Are currently up to date and carry most of the information here. I'm running a test workflow to make sure we can run on OSG and the local cluster using the same pycbc_submit_dax. If these tests all succeed we can modify the instructions to the most modern settings (no perl-ing).

Once I get this to work I'll make sure everything works stably on pycbc 1.5.X.

duncan-brown commented 8 years ago

Closing and opening a new ticket since most of this has been fixed. Will open a new ticket when @stevereyes01 tests against 1.5.x