radical-experiments / AIMES-Experience

Experiments for the AIMES practice paper
MIT License
0 stars 0 forks source link

Document synapse deployment on OSG #6

Open andre-merzky opened 8 years ago

andre-merzky commented 8 years ago

I think it will boil down to this scheme:

we provide a small shell script, along the lines alluded to before:

if ! test -e $HOME/ve.synapse
do
  module load python 
  virtualenv  $HOME/ve.synapse 
  .  $HOME/ve.synapse/bin/activate
  pip install aimes.skeleton
  pip install radical.synapse
else
   . $HOME/ve.synapse/bin/activate
done

Both need new releases for skeleton and synapse -- or we can pull a git branch I guess, too. That script will have to be staged with each CU pilot, once, and each CU runs this. Since all OSG pilots are single core, only one CU will run the script at a time. The first CU will thus install the tools, and any follow-up CU will just load the existing VE.

For the mixed OSG/XSEDE experiments, we will need to manually pre-install the VE on the XSEDE hosts, so the script would do nothing in that case.

That is probably the simplest approach, should work on all machines where RP works, and has a one-time, fixed and measurable overhead per pilot on first CU execution (pre-exec timings are profiles separately from exe timings, per CU).

andre-merzky commented 8 years ago

Hi Ming,

the following should do the trick now:

cud = rp.ComputeUnitDescription()
cud.pre_exec = [
    'wget https://raw.githubusercontent.com/radical-cybertools/radical.synapse/feature/named_storage/bin/radical-synapse-setup.sh',
    '/bin/sh radical-synapse-setup.sh',
    '. $HOME/ve.synapse/bin/activate']
cud.executable = 'radical-synapse-version'

It is somewhat hackish, but also gives us quite some flexibility for experiments. This should work ok for 10-core pilots -- for the XSEDE case, where multiple CUs run concurrently, the behavior is undefined, as concurrent virtualenv and pip install calls are not behaving well. If you create $HOME/ve.synapse/ manually though (i.e. in advance), the above should be safe to use.

mturilli commented 8 years ago

Hi Andre,

This should work ok for 10-core pilots

I am not sure how to interpret that. Is 10-cores the limit of synapse concurrency? Can we use up to 2048 pilots with a single synapse CU or do we have to have 2048/10 CUs per pilot?

andre-merzky commented 8 years ago

Ah, sorry for the confusion, this was supposed to mean '1-core-pilots'... So, the script only reliably creates a VE if we guarantee that only one script instance is running. On a single-core pilot that is the case, only one unit will ever be active...