Open kevloui opened 5 years ago
Hey @andre-merzky just bringing this to your attention. Seems to be an error in radical-pilot-agent-statepush
Thanks @vivek-bala, I'll track this down! @Keverne , could you please attach agent_0.cfg
from the pilot sandbox? Thank you!
Hi all, here is the agent_0.cfg file from the pilot sandbox!
Sorry, it took me a while to understand this error - thanks for the details provided! It seems like radical-pilot-agent-statepush
is not correctly installed in the pilot sandbox. Can you please list the content of 0.50.21 /scratch/sciteam/louison/radical.pilot.sandbox/re.session.ip-172-31-21-178.ubuntu.017928.0004/pilot.0000/rp_install/bin/
?
That error happens after all is said and done, it should not be fatal. I'll provide a fix to shield against it, but still would like to understand how that happened. I don't see that issue on BW with the same RP release...
Thanks, Andre.
Please give the RP branch fix/bpti_17
a try. I'll release this as soon as it is confirmed to work. Thanks!
Hi @andre-merzky does this need to be installed locally or on blue waters?
The RP installation from your local submission host is on the fly transferred to BW and also installed there, in the pilot sandbox. So the local install is what you are looking for.
Best, Andre.
when I install the fix/bpti_17
i get the following error
Traceback (most recent call last):
File "runme.py", line 621, in <module>
appman.resource_desc = res_dict
File "/home/ubuntu/shared/ve/local/lib/python2.7/site-packages/radical/entk/appman/appmanager.py", line 195, in resource_desc
from radical.entk.execman.rp import ResourceManager
File "/home/ubuntu/shared/ve/local/lib/python2.7/site-packages/radical/entk/execman/rp/__init__.py", line 1, in <module>
from resource_manager import ResourceManager
File "/home/ubuntu/shared/ve/local/lib/python2.7/site-packages/radical/entk/execman/rp/resource_manager.py", line 6, in <module>
import radical.pilot as rp
ImportError: No module named pilot
This looks like the installation did not actually succeed? Can you send the commands you used to install, and their output? Was the virtualenv active during installation?
ping - does the deployment problem persist?
I created a new virtual environment so there was no legacy code that could have caused issues. I then went through my install procedure installing the latest Pip RP version. This worked (until #18) albeit giving me a slightly different output. I don't know if something had changed recently or if it was more an issue on my end.
Im cautious to say that this deployment problem is no longer an issue but i think it has been resolved!
Thanks, Chris. Glad this is not stalling you anymore, but the recent surge in deployment issues (not only this ticket) worries me... Anyway, we probably should close this ticket unless you or Vivek see a way to reproduce this?
Managed to get a job started on bluewaters, but now we encounter a different error. See log below:
bootstrap_0.out.log