radical-collaboration / Glatard-Lab

0 stars 1 forks source link

Issues running 00_getting_started.py #1

Open ValHayot opened 4 years ago

ValHayot commented 4 years ago

Ok, good news! Following the issue reported here https://github.com/radical-cybertools/radical.pilot/issues/2035#issue-551034155, I figured out that it was unable to gather the results because OpenMPI was not installed. However, I'm still having issues running the example application.

My current issue is that RP times out waiting for the pilot:

================================================================================
 Getting Started (RP version 1.0.0)                                             
================================================================================

new session: [rp.session.js-156-107.jetstream-cloud.org.vhayot.018285.0002]    \
database   : [mongodb://localhost:27017/radicaldb]                            ok
read config                                                                   ok

--------------------------------------------------------------------------------
submit pilots                                                                   

create pilot manager                                                          ok
submit 1 pilot(s)
        [local.localhost:1]
                                                                              ok

--------------------------------------------------------------------------------
submit units                                                                    

create unit manager                                                           ok
add 1 pilot(s)                                                                ok
create 1 unit description(s)
        .                                                                     ok
submit 1 unit(s)
        .                                                                     ok

--------------------------------------------------------------------------------
gather results                                                                  

wait for 1 unit(s)
        +      1                                                              ok

--------------------------------------------------------------------------------
finalize                                                                        

closing session rp.session.js-156-107.jetstream-cloud.org.vhayot.018285.0002   \
close unit manager                                                            ok
close pilot manager                                                            \
wait for 1 pilot(s)
              0                                                          timeout
                                                                              ok
+ rp.session.js-156-107.jetstream-cloud.org.vhayot.018285.0002 (json)
+ pilot.0000 (profiles)
+ pilot.0000 (logfiles)
session lifetime: 35.3s                                                       ok
(venv) vhayot@js-156-107:~/venv/share/radical.pilot/examples$ radical-stack 

  python               : 3.5.2
  pythonpath           : 
  virtualenv           : /home/vhayot/venv

  radical.pilot        : 1.0.0
  radical.saga         : 1.0.0
  radical.utils        : 1.0.0

grep of the logs:

(venv) vhayot@js-156-107:~/venv/share/radical.pilot/examples$ grep -ri error  ~/radical.pilot.sandbox/rp.session.js-156-107.jetstream-cloud.org.vhayot.018285.0002/*/*.log
/home/vhayot/radical.pilot.sandbox/rp.session.js-156-107.jetstream-cloud.org.vhayot.018285.0002/pilot.0000/radical.saga.log:ImportError: No module named 'libcloud'
/home/vhayot/radical.pilot.sandbox/rp.session.js-156-107.jetstream-cloud.org.vhayot.018285.0002/pilot.0000/radical.saga.log:ImportError: No module named 'libcloud'
/home/vhayot/radical.pilot.sandbox/rp.session.js-156-107.jetstream-cloud.org.vhayot.018285.0002/pilot.0000/radical.saga.log:ImportError: No module named 'libcloud'
/home/vhayot/radical.pilot.sandbox/rp.session.js-156-107.jetstream-cloud.org.vhayot.018285.0002/pilot.0000/radical.saga.log:ImportError: No module named 'libcloud'
/home/vhayot/radical.pilot.sandbox/rp.session.js-156-107.jetstream-cloud.org.vhayot.018285.0002/pilot.0000/radical.saga.log:ImportError: No module named 'libcloud'
/home/vhayot/radical.pilot.sandbox/rp.session.js-156-107.jetstream-cloud.org.vhayot.018285.0002/pilot.0000/radical.saga.log:ImportError: No module named 'libcloud'

However, libcloud is installed in my virtual environment:

(venv) vhayot@js-156-107:~/venv/share/radical.pilot/examples$ pip freeze | grep libcloud
apache-libcloud==2.8.0

and

(venv) vhayot@js-156-107:~/venv/share/radical.pilot/examples$ python
Python 3.5.2 (default, Oct  8 2019, 13:06:37) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import libcloud
>>> 

I'm thinking the pilots are maybe not using the virtual environment that I've created. I'm trying to check if there's perhaps an environment variable that I forgot to set, such that RP pilots can use my virtual env.

iparask commented 4 years ago

Hello @ValHayot, I think the pilot ended correctly. You are right, the pilot's side that executes units, the Pilot's agent, does not use the same environment as the one you are running from. It creates one on its own on the fly.

In your home directory, there should be a folder named radical.pilot.sandbox. In there you will see a folder named ve.* which is the virtual environment the agent is using. The rest of the folders are the pilot sessions you might have executed.

The agent creates its own environment because it assumes that it is on a different resource from the one you are launching your execution.

Inside rp.session.js-156-107.jetstream-cloud.org.vhayot.018285.0002/pilot.0000 there are several log files with extensions .log, .out, .err. Would you mind uploading a zip file with them here?

iparask commented 4 years ago

Furthermore, in the pilot.0000 folder, you will see a unit.000000. That is where your example application executed. If I remember correctly some error in the log files are not necessary fatal. Can you check in the unit and let us know if the application executed correctly?

ValHayot commented 4 years ago

Hey @iparask, sorry for the delay in replying! Here are the requested logs: pilot_logs.tar.gz

It seems like the application executed correctly (no fatal errors) and the correct output was produced. It just surprised me that the pilot timed out due to an ImportError, especially considering that it's not the "normal" behaviour, as per https://radical-cybertools.github.io/radical-pilot/quick_start.html output and the fact that the only thing installed in my venv is radical pilot and its dependencies (there shouldn't be any unrelated libraries)

andre-merzky commented 4 years ago

Hi @ValHayot : the timeout message is misleading I think, it seems to happen also on successful and timely pilot termination. I opened a ticket in RP to get this fixed - for now, please ignore that message.
Thanks, Andre.

andre-merzky commented 4 years ago

Note that this is fixed in the RP devel branch. This will be in the next release (this week).