radical-collaboration / extasy-bpti

0 stars 1 forks source link

Blue Waters RP 0.50.7 and 0.50.10 are failing #9

Open FranklinBetten opened 5 years ago

FranklinBetten commented 5 years ago

JetStream VE

(myenv) hal9000@js-17-39:~/Documents/feature_entk-0.7/bw_sync_01/bw_p14b02_left_d3_k12_1000_k34_1000_1ns_sims_02$ radical-stack 

  python               : 2.7.12
  pythonpath           : 
  virtualenv           : /home/hal9000/Documents/feature_entk-0.7/myenv

  radical.entk         : 0.7.6
  radical.pilot        : 0.50.7
  radical.utils        : 0.50.2
  saga                 : 0.50.0

Radical pilot versions 0.50.7 and 0.50.10 are failing on BW.

Please see attached log files re.session.js-17-39.jetstream-cloud.org.hal9000.017811.0015.zip

vivek-bala commented 5 years ago

I can confirm that I get this issue as well.

I see the following error msg from the agent (same as the error in the above logs):

state.pubsub.bridge.0000.log:2018-10-07 16:14:46,009: state.pubsub.bridge.0000: MainProcess                     : MainThread     : ERROR   : initialization failed (child state.pubsub.bridge.0000.child failed to come up [s])
andre-merzky commented 5 years ago

There was a recent change of the shell stack size limit. I can imagine that this is conflicting with Python's stack size and the way threads are accounted for. I'll investigate...

andre-merzky commented 5 years ago

RP release 0.50.12 has been pushed out which fixes the BW problem (for now).