radical-cybertools / radical.pilot

RADICAL-Pilot
http://radical-cybertools.github.io/radical-pilot/index.html
Other
54 stars 23 forks source link

Trouble using a resource not in list of officially supported ones #2079

Closed Abdullah-Ghani closed 4 years ago

Abdullah-Ghani commented 4 years ago

I am trying to use radical pilot to run experiments on the Rutgers Amarel cluster. I used the documentation https://radicalpilot.readthedocs.io/en/devel/machconf.html#writing-a-custom-resource-configuration-file to create my own config file.

I saved it in [aag193@amarel2 ~]$ vi .radical/pilot/configs/resource_rutgers.json

this is the file

# filename: rutgers.json
{
    "amarel":
    {
        "description"                 : "The Amarel HPC cluster at Rutgers.",
        "notes"                       : "Access only from registered IP addresses.",
        "schemas"                     : ["ssh", "gsissh"],
        "ssh"                         :
         {
            "job_manager_endpoint"    : "loadl+ssh://amarel.rutgers.edu/",
            "filesystem_endpoint"     : "sftp://amarel.rutgers.edu/"
        },
        "gsissh"                      :
        {
            "job_manager_endpoint"    : "loadl+gsissh://amarel.rutgers.edu:2222/",
            "filesystem_endpoint"     : "gsisftp://amarel.rutgers.edu:2222/"
        },
        "default_queue"               : "test",
        "resource_manager"            : "SLURM",
        "task_launch_method"          : "SSH",
        "mpi_launch_method"           : "MPIEXEC",
        "global_virtenv"              : "/home/hpc/pr87be/di29sut/pilotve",
        "pre_bootstrap_0"             : ["source /etc/profile",
                                         "source /etc/profile.d/modules.sh",
                                         "module load python/2.7.6",
                                         "module unload mpi.ibm", "module load mpi.intel",
                                         "source /home/hpc/pr87be/di29sut/pilotve/bin/activate"
                                        ],
        "valid_roots"                 : ["/home", "/gpfs/work", "/gpfs/scratch"],
        "python_dist"                 : "default"
        "agent_type"                  : "multicore",
        "agent_scheduler"             : "CONTINUOUS",
        "agent_spawner"               : "POPEN",
        "pilot_agent"                 : "radical-pilot-agent-multicore.py",
        "pilot_dist"                  : "default"
    }
}

but when I run the script on jet stream with the following resource allocation, it gives an error.

my resource allocation in my script

pd_init = {'resource'      : 'rutgers.amarel',
                   'runtime'       : 60,
                   'exit_on_error' : True,
                   'cores'         : 2
                  }

        pdesc = rp.ComputePilotDescription(pd_init)
mturilli commented 4 years ago

What is the error?

Abdullah-Ghani commented 4 years ago

I suspect one of the main issues to be that Amarel requires users to be on the rutgers network or use a VPN to access the cluster.

(ve) abdullahg@js-156-107:~/ve/share/radical.pilot/examples$ python 00_getting_started.py 

================================================================================
 Getting Started (RP version 1.1.0)                                             
================================================================================

new session: [rp.session.js-156-107.jetstream-cloud.org.abdullahg.018312.0000] \
database   : [mongodb://rct:rct_test@two.radical-project.org/rct_test]        ok
create pilot manager                                                          ok
create unit manager                                                           ok

--------------------------------------------------------------------------------
submit pilots                                                                   

submit 1 pilot(s)
        caught Exception: Resource domain 'rutgers' is unknown.
--------------
RADICAL Utils -- Stacktrace [11791] [MainThread]

abdulla+ 11791 11202  4 10:52 pts/18   00:00:00              \_ python 00_getting_started.py
Traceback (most recent call last):
File "00_getting_started.py", line 62, in <module>
pilot = pmgr.submit_pilots(pdesc)
File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/pilot/pilot_manager.py", line 551, in submit_pilots
pilot = ComputePilot(pmgr=self, descr=pd)
File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/pilot/compute_pilot.py", line 99, in __init__
= self._session._get_jsurl           (pilot)
File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/pilot/session.py", line 927, in _get_jsurl
rcfg    = self.get_resource_config(resrc, schema)
File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/pilot/session.py", line 648, in get_resource_config
raise RuntimeError("Resource domain '%s' is unknown." % domain)
RuntimeError: Resource domain 'rutgers' is unknown.

--------------

--------------------------------------------------------------------------------
finalize                                                                        

closing session rp.session.js-156-107.jetstream-cloud.org.abdullahg.018312.0000
        \
close unit manager                                                            ok
close pilot manager                                                            \
wait for 0 pilot(s)
              0                                                               ok
                                                                              ok
+ rp.session.js-156-107.jetstream-cloud.org.abdullahg.018312.0000 (json)
session lifetime: 21.4s                                                       ok
Traceback (most recent call last):
  File "00_getting_started.py", line 62, in <module>
    pilot = pmgr.submit_pilots(pdesc)
  File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/pilot/pilot_manager.py", line 551, in submit_pilots
    pilot = ComputePilot(pmgr=self, descr=pd)
  File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/pilot/compute_pilot.py", line 99, in __init__
    = self._session._get_jsurl           (pilot)
  File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/pilot/session.py", line 927, in _get_jsurl
    rcfg    = self.get_resource_config(resrc, schema)
  File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/pilot/session.py", line 648, in get_resource_config
    raise RuntimeError("Resource domain '%s' is unknown." % domain)
RuntimeError: Resource domain 'rutgers' is unknown.
andre-merzky commented 4 years ago

You can check your resource config with this command:

python -c 'import radical.pilot as rp; s=rp.Session(); print(s.get_resource_config("rutgers.amarel"))'

This will tell you that your resource config misses a comma at end of line 30 ("python_dist"). You actually should also have seen an ERROR or WARNING message in the log files?

At this point RP really just complains about the resource not being known, it does not yet attempt to connect to Amarel, so thee VPN is not a problem (yet).

andre-merzky commented 4 years ago

BTW, just checking the json syntax might be quicker with

python -c 'import radical.utils as ru; ru.read_json("/home/merzky/.radical/pilot/configs/resource_rutgers.json")'

but using the RP session for a check also confirms that RP finds the file in the expected location.

Abdullah-Ghani commented 4 years ago

my apologies for that. after making sure it is able to read it accordingly, now its giving the could now resolve hostname error which I expected.

================================================================================
 Getting Started (RP version 1.1.0)                                             
================================================================================

new session: [rp.session.js-156-107.jetstream-cloud.org.abdullahg.018312.0006] \
database   : [mongodb://rct:rct_test@two.radical-project.org/rct_test]        ok
create pilot manager                                                          ok
create unit manager                                                           ok

--------------------------------------------------------------------------------
submit pilots                                                                   

submit 1 pilot(s)
        caught Exception: read from process failed '[Errno 5] Input/output error' : (ssh: Could not resolve hostname amarel.rutgers.edu: Name or service not known
) ((ssh: Could not resolve hostname amarel.rutgers.edu: Name or service not known
)) (/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/saga/utils/pty_exceptions.py +40 (translate_exception)  :  if   'could not resolve hostname' in lmsg: e = se.BadParameter (cmsg))
--------------
RADICAL Utils -- Stacktrace [8722] [MainThread]

abdulla+  8722  7146  4 16:10 pts/7    00:00:00              \_ python 00_getting_started.py
abdulla+  9102  8722  0 16:10 ?        00:00:00                  \_ [ssh] <defunct>
Traceback (most recent call last):
File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/saga/utils/pty_process.py", line 608, in read
buf = os.read (f, readsize)
OSError: [Errno 5] Input/output error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/saga/utils/pty_process.py", line 804, in find
data += self.read (timeout=_POLLDELAY)
File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/saga/utils/pty_process.py", line 688, in read
% (e, self.tail))
radical.saga.exceptions.NoSuccess: read from process failed '[Errno 5] Input/output error' : (ssh: Could not resolve hostname amarel.rutgers.edu: Name or service not known
) (/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/saga/utils/pty_process.py +688 (read)  :  % (e, self.tail)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "00_getting_started.py", line 63, in <module>
pilot = pmgr.submit_pilots(pdesc)
File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/pilot/pilot_manager.py", line 551, in submit_pilots
pilot = ComputePilot(pmgr=self, descr=pd)
File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/pilot/compute_pilot.py", line 100, in __init__
self._resource_sandbox = self._session._get_resource_sandbox(pilot)
File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/pilot/session.py", line 772, in _get_resource_sandbox
shell = self.get_js_shell(resource, schema)
File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/pilot/session.py", line 818, in get_js_shell
shell = rsup.PTYShell(js_url, self)
File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/saga/utils/pty_shell.py", line 245, in __init__
interactive=self.interactive)
File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/saga/utils/pty_shell_factory.py", line 207, in initialize
self._initialize_pty(info['pty'], info)
File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/saga/utils/pty_shell_factory.py", line 428, in _initialize_pty
raise ptye.translate_exception (e)
File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/saga/utils/pty_shell_factory.py", line 274, in _initialize_pty
n, match = pty_shell.find (prompt_patterns, delay)
File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/saga/utils/pty_process.py", line 807, in find
raise ptye.translate_exception (e, "(%s)" % data)
radical.saga.exceptions.BadParameter: read from process failed '[Errno 5] Input/output error' : (ssh: Could not resolve hostname amarel.rutgers.edu: Name or service not known
) ((ssh: Could not resolve hostname amarel.rutgers.edu: Name or service not known
)) (/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/saga/utils/pty_exceptions.py +40 (translate_exception)  :  if   'could not resolve hostname' in lmsg: e = se.BadParameter (cmsg))

--------------

--------------------------------------------------------------------------------
finalize                                                                        

closing session rp.session.js-156-107.jetstream-cloud.org.abdullahg.018312.0006
        \
close unit manager                                                            ok
close pilot manager                                                            \
wait for 0 pilot(s)
              0                                                               ok
                                                                              ok
+ rp.session.js-156-107.jetstream-cloud.org.abdullahg.018312.0006 (json)
session lifetime: 20.1s                                                       ok
Traceback (most recent call last):
  File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/saga/utils/pty_process.py", line 608, in read
    buf = os.read (f, readsize)
OSError: [Errno 5] Input/output error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/saga/utils/pty_process.py", line 804, in find
    data += self.read (timeout=_POLLDELAY)
  File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/saga/utils/pty_process.py", line 688, in read
    % (e, self.tail))
radical.saga.exceptions.NoSuccess: read from process failed '[Errno 5] Input/output error' : (ssh: Could not resolve hostname amarel.rutgers.edu: Name or service not known
) (/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/saga/utils/pty_process.py +688 (read)  :  % (e, self.tail)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "00_getting_started.py", line 63, in <module>
    pilot = pmgr.submit_pilots(pdesc)
  File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/pilot/pilot_manager.py", line 551, in submit_pilots
    pilot = ComputePilot(pmgr=self, descr=pd)
  File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/pilot/compute_pilot.py", line 100, in __init__
    self._resource_sandbox = self._session._get_resource_sandbox(pilot)
  File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/pilot/session.py", line 772, in _get_resource_sandbox
    shell = self.get_js_shell(resource, schema)
  File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/pilot/session.py", line 818, in get_js_shell
    shell = rsup.PTYShell(js_url, self)
  File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/saga/utils/pty_shell.py", line 245, in __init__
    interactive=self.interactive)
  File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/saga/utils/pty_shell_factory.py", line 207, in initialize
    self._initialize_pty(info['pty'], info)
  File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/saga/utils/pty_shell_factory.py", line 428, in _initialize_pty
    raise ptye.translate_exception (e)
  File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/saga/utils/pty_shell_factory.py", line 274, in _initialize_pty
    n, match = pty_shell.find (prompt_patterns, delay)
  File "/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/saga/utils/pty_process.py", line 807, in find
    raise ptye.translate_exception (e, "(%s)" % data)
radical.saga.exceptions.BadParameter: read from process failed '[Errno 5] Input/output error' : (ssh: Could not resolve hostname amarel.rutgers.edu: Name or service not known
) ((ssh: Could not resolve hostname amarel.rutgers.edu: Name or service not known
)) (/home/abdullahg/ve/local/lib/python3.7/site-packages/radical/saga/utils/pty_exceptions.py +40 (translate_exception)  :  if   'could not resolve hostname' in lmsg: e = se.BadParameter (cmsg))
andre-merzky commented 4 years ago

my apologies for that.

No need to apologize - that is why we use issues :-)

Could not resolve hostname amarel.rutgers.edu

I am not sure how I can help with that. What is the normal way to access Amarel?

Abdullah-Ghani commented 4 years ago

ssh but the machine ssh-ing needs to be on the rutgers network or be using a VPN.

andre-merzky commented 4 years ago

Are you on the Rutgers network or are you using a VPN? :-)

Abdullah-Ghani commented 4 years ago

ran it from my local machine on the Rutgers Network with passwordless SSH access. Process terminated by itself after long wait time.

 $ python 00_getting_started.py 

================================================================================
 Getting Started (RP version 1.1.1)                                             
================================================================================

new session: [rp.session.nbp-160-2.nbp.ruw.rutgers.edu.abdullahghani.018316.0001]
        \
database   : [mongodb://rct:rct_test@two.radical-project.org/rct_test]        ok
create pilot manager                                                          ok
create unit manager                                                           ok

--------------------------------------------------------------------------------
submit pilots                                                                   

submit 1 pilot(s)
        [rutgers.amarel:2]
                                                                              ok

--------------------------------------------------------------------------------
submit 1024 units                                                               

create: ########################################################################
submit: ########################################################################
wait  : Terminated: 15
$ grep ERROR *.*
pmgr.0000.log:1582566477.828 : pmgr.0000            : 80767 : 123145553555456 : ERROR    : [Callback]: pilot 'pilot.0000' failed - exit
pmgr.0000.log:1582566477.829 : pmgr.0000            : 80767 : 123145553555456 : ERROR    : listener died
pmgr_launching.0000.log:1582566477.827 : pmgr_launching.0000  : 80814 : 123145406685184 : ERROR    : bulk launch failed
$ radical-stack

  python               : 3.7.6
  pythonpath           : 
  virtualenv           : /Users/abdullahghani/venv

  radical.entk         : 1.0.0
  radical.pilot        : 1.1.1
  radical.saga         : 1.1.2
  radical.utils        : 1.1.1
andre-merzky commented 4 years ago

You found ERROR : bulk launch failed which is likely the culprit - you should look into that file and see if there are more details to the ERROR log which help understanding the problem.

Abdullah-Ghani commented 4 years ago

yes.

$ cat pmgr_launching.0000.log 
1582566477.827 : pmgr_launching.0000  : 80814 : 123145406685184 : ERROR    : bulk launch failed
Traceback (most recent call last):
  File "/Users/abdullahghani/venv/lib/python3.7/site-packages/radical/pilot/pmgr/launching/default.py", line 553, in work
    self._start_pilot_bulk(resource, schema, pilots)
  File "/Users/abdullahghani/venv/lib/python3.7/site-packages/radical/pilot/pmgr/launching/default.py", line 680, in _start_pilot_bulk
    info = self._prepare_pilot(resource, rcfg, pilot, expand)
  File "/Users/abdullahghani/venv/lib/python3.7/site-packages/radical/pilot/pmgr/launching/default.py", line 994, in _prepare_pilot
    raise RuntimeError("'global_virtenv' is deprecated (%s)" % resource)
RuntimeError: 'global_virtenv' is deprecated (rutgers.amarel)
andre-merzky commented 4 years ago

RuntimeError: 'global_virtenv' is deprecated

You found the error :-) The documentation you used seems out of sync with the current resource configs. You may want to copy one of the existing resource configs (e.g., xsede.stampede2) instead, and change them towards the Amarel setup. I'll open a separate ticket for the documentation fix.

Abdullah-Ghani commented 4 years ago

this is my new edited config file

# filename: rutgers.json
{
"amarel":
{
        "description"                 : "The Amarel HPC cluster at Rutgers.",
        "notes"                       : "Access from registered IP address.",
        "schemas"                     : ["ssh","local"],
        "ssh"                         :
        {
            "job_manager_endpoint"    : "slurm+ssh://aag193@amarel.rutgers.edu/",
            "filesystem_endpoint"     : "sftp://aag193@amarel.rutgers.edu/home/aag193/"
        },
        "local"                       :
        {
            "job_manager_endpoint"    : "slurm://amarel.rutgers.edu:/home/aag193/",
            "filesystem_endpoint"     : "file://amarel.rutgers.edu:/home/aag193/"
        },
        "default_queue"               : "normal",
        "resource_manager"            : "SLURM",
        "agent_scheduler"             : "CONTINUOUS",
        "agent_spawner"               : "POPEN",
        "agent_launch_method"         : "SSH",
        "task_launch_method"          : "SSH",
        "mpi_launch_method"           : "IBRUN",
        "pre_bootstrap_0"             :["module load intel/18.0.0",
                                         "module load python3/3.7.0"
                                        ],
        "default_remote_workdir"      : "$WORK",
        "valid_roots"                 : ["/scratch", "$SCRATCH", "/work", "$WORK"],
        "rp_version"                  : "local",
        "virtenv_mode"                : "create",
        "python_dist"                 : "default",
        "export_to_cu"                : ["LMOD_CMD",
                                         "LMOD_SYSTEM_DEFAULT_MODULES",
                                         "LD_LIBRARY_PATH"],
        "cu_pre_exec"                 : ["module restore"]
    }
}

Running the pilot example script gives me the following error:

$ cat pmgr_launching.0000.log
1582661910.540 : pmgr_launching.0000  : 98194 : 123145414258688 : ERROR    : bulk launch failed
Traceback (most recent call last):
  File "/Users/abdullahghani/venv/lib/python3.7/site-packages/radical/pilot/pmgr/launching/default.py", line 553, in work
    self._start_pilot_bulk(resource, schema, pilots)
  File "/Users/abdullahghani/venv/lib/python3.7/site-packages/radical/pilot/pmgr/launching/default.py", line 755, in _start_pilot_bulk
    fs.copy(tar_url, tar_rem, flags=rsfs.CREATE_PARENTS)
  File "/Users/abdullahghani/venv/lib/python3.7/site-packages/radical/saga/namespace/directory.py", line 354, in copy
    if url_2: return self._adaptor.copy(url_1, url_2, flags, ttype=ttype)
  File "/Users/abdullahghani/venv/lib/python3.7/site-packages/radical/saga/adaptors/cpi/decorators.py", line 62, in wrap_function
    return sync_function (self, *args, **kwargs)
  File "/Users/abdullahghani/venv/lib/python3.7/site-packages/radical/saga/adaptors/shell/shell_file.py", line 539, in copy
    self._create_parent(cwdurl, tgt)
  File "/Users/abdullahghani/venv/lib/python3.7/site-packages/radical/saga/adaptors/shell/shell_file.py", line 242, in _create_parent
    % (dirname, ret, out))
radical.saga.exceptions.NoSuccess: failed at mkdir '/radical.pilot.sandbox/rp.session.nbp-161-35.nbp.ruw.rutgers.edu.abdullahghani.018317.0005/': (1) (mkdir: cannot create directory ‘/radical.pilot.sandbox’: Permission denied

I added the file endpoint to point to my partition in the home directory but that didn't work. How do I change it.

andre-merzky commented 4 years ago

Seems like "default_remote_workdir" : "$WORK" is not valid - I would assume that $WORK is not set by amarel, so this expands to "", and RP tries to create the sandbox base in '%s/radical.pilot.sandbox' % cfg.default_remote_workdir

Abdullah-Ghani commented 4 years ago

This is my edited config file

# filename: rutgers.json
{
"amarel":
{
        "description"                 : "The Amarel HPC cluster at Rutgers.",
        "notes"                       : "Access from registered IP address.",
        "schemas"                     : ["ssh","local"],
        "ssh"                         :
        {
            "job_manager_endpoint"    : "slurm+ssh://aag193@amarel.rutgers.edu/",
            "filesystem_endpoint"     : "sftp://aag193@amarel.rutgers.edu/home/aag193/"
        },
        "local"                       :
        {
            "job_manager_endpoint"    : "slurm://amarel.rutgers.edu:/home/aag193/",
            "filesystem_endpoint"     : "file://amarel.rutgers.edu:/home/aag193/"
        },
        "default_queue"               : "main",
        "resource_manager"            : "SLURM",
        "agent_scheduler"             : "CONTINUOUS",
        "agent_spawner"               : "POPEN",
        "agent_launch_method"         : "SSH",
        "task_launch_method"          : "SSH",
        "mpi_launch_method"           : "srun",
        "pre_bootstrap_0"             :["module load intel/17.0.4",
                                         "module load python3/3.7.0"
                                        ],
        "default_remote_workdir"      : "/home/aag193",
        "valid_roots"                 : ["/scratch", "$SCRATCH", "/home", "$HOME"],
        "rp_version"                  : "local",
        "virtenv_mode"                : "create",
        "python_dist"                 : "default",
        "export_to_cu"                : ["LMOD_CMD",
                                         "LMOD_SYSTEM_DEFAULT_MODULES",
                                         "LD_LIBRARY_PATH"],
        "cu_pre_exec"                 : ["module restore"]
    }
}

Running into some minor errors when running the script. Not sure what they mean.

$ grep ERROR *.*
grep: pilot.0000: Is a directory
pmgr.0000.log:1582738077.054 : pmgr.0000            : 8468  : 123145539309568 : ERROR    : [Callback]: pilot 'pilot.0000' failed - exit
pmgr.0000.log:1582738077.054 : pmgr.0000            : 8468  : 123145539309568 : ERROR    : listener died
rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0002.log:1582738227.192 : rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0002 : 8468  : 4637433280 : ERROR    : failed to fet profile for pilot.0000
$ cat pmgr.0000.log 
1582738077.054 : pmgr.0000            : 8468  : 123145539309568 : ERROR    : [Callback]: pilot 'pilot.0000' failed - exit
1582738077.054 : pmgr.0000            : 8468  : 123145539309568 : ERROR    : listener died
Traceback (most recent call last):
  File "/Users/abdullahghani/venv/lib/python3.7/site-packages/radical/utils/zmq/pubsub.py", line 277, in _listener
    cb(t, m)
  File "/Users/abdullahghani/venv/lib/python3.7/site-packages/radical/pilot/pilot_manager.py", line 312, in _state_sub_cb
    if not self._update_pilot(thing, publish=False):
  File "/Users/abdullahghani/venv/lib/python3.7/site-packages/radical/pilot/pilot_manager.py", line 353, in _update_pilot
    self._pilots[pid]._update(pilot_dict)
  File "/Users/abdullahghani/venv/lib/python3.7/site-packages/radical/pilot/compute_pilot.py", line 205, in _update
    else      : cb(self, self.state)
  File "/Users/abdullahghani/venv/lib/python3.7/site-packages/radical/pilot/compute_pilot.py", line 148, in _default_state_cb
    ru.cancel_main_thread('int')
  File "/Users/abdullahghani/venv/lib/python3.7/site-packages/radical/utils/threads.py", line 136, in cancel_main_thread
    if signame: signum = get_signal_by_name(signame)
  File "/Users/abdullahghani/venv/lib/python3.7/site-packages/radical/utils/threads.py", line 228, in get_signal_by_name
    'cld'     : signal.SIGCLD,
AttributeError: module 'signal' has no attribute 'SIGCLD'
(venv) nbp-160-2:rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0002 abdullahghani$ cat rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0002.log 
1582738227.192 : rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0002 : 8468  : 4637433280 : ERROR    : failed to fet profile for pilot.0000
Traceback (most recent call last):
  File "/Users/abdullahghani/venv/lib/python3.7/site-packages/radical/pilot/utils/session.py", line 172, in fetch_profiles
    profiles = sandbox.list('*.prof')
  File "/Users/abdullahghani/venv/lib/python3.7/site-packages/radical/saga/namespace/directory.py", line 243, in list
    return self._adaptor.list (pattern, flags, ttype=ttype)
  File "/Users/abdullahghani/venv/lib/python3.7/site-packages/radical/saga/adaptors/cpi/decorators.py", line 62, in wrap_function
    return sync_function (self, *args, **kwargs)
  File "/Users/abdullahghani/venv/lib/python3.7/site-packages/radical/saga/adaptors/shell/shell_file.py", line 467, in list
    % (ret, out))
radical.saga.exceptions.NoSuccess: failed to list(): (2)(/bin/ls: cannot access *.prof: No such file or directory
) (/Users/abdullahghani/venv/lib/python3.7/site-packages/radical/saga/adaptors/shell/shell_file.py +467 (list)  :  % (ret, out)))
andre-merzky commented 4 years ago

AttributeError: module 'signal' has no attribute 'SIGCLD'

Alas, the minor errors are not minor :-P First, it seems Python deprecated a signal name we are still using (passively). I pushed an RU branch hotfix/sigcld_deprecation which gets rid of this error. Your test will still die (but with more stile), because it seems like your pilot got into FAILED state. So either you have an error in the launcher again, or you need to check the pilot sandbox you now use on Amarel to see why it failed.

andre-merzky commented 4 years ago

Hi @mtitov : I added you to this ticket so you can tag along if you want. Abdullah goes through the procedure of getting the stack to work on a new resource, Amarel, and it clearly shows some of the painpoints and deficiencies we have in that context.

Abdullah-Ghani commented 4 years ago

mistake on my end. the modules being loaded were different iterations. The script now runs on my side without errors.

This is the output on amarel/radical pilot sandbox

some modules/installations failed.

[aag193@amarel1 pilot.0000]$ cat bootstrap_0.out 
bootstrap_0 stderr redirected to stdout
---------------------------------------------------------------------
bootstrap_0 running on host: slepner034.amarel.rutgers.edu.
bootstrap_0 started as     : '/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/bootstrap_0.sh -d radical.utils-1.1.1.tar.gz:radical.saga-1.1.2.tar.gz:radical.pilot-1.1.1.tar.gz -p pilot.0000 -s rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003 -m create -r local -b default -g default -v /home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1 -y 60 -e module load intel/17.0.4 -e module load python/3.5.2'
Environment of bootstrap_0 process:
BASH_ENV=/opt/sw/admin/lmod/lmod/init/bash
BASH_FUNC_ml()=() {  eval $($LMOD_DIR/ml_cmd "$@")
BASH_FUNC_module()=() {  eval $($LMOD_CMD bash "$@") && eval $(${LMOD_SETTARG_CMD:-:} -s sh)
ENVIRONMENT=BATCH
HISTCONTROL=ignoredups
HISTSIZE=1000
HOME=/home/aag193
HOSTNAME=slepner034.amarel.rutgers.edu
LANG=en_US.UTF-8
LESSOPEN=||/usr/bin/lesspipe.sh %s
LMOD_CMD=/opt/sw/admin/lmod/lmod/libexec/lmod
LMOD_DIR=/opt/sw/admin/lmod/lmod/libexec
LMOD_PKG=/opt/sw/admin/lmod/lmod
LMOD_SETTARG_FULL_SUPPORT=no
LMOD_sys=Linux
LMOD_VERSION=8.1.6
LOGNAME=aag193
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=01;05;37;41:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=01;36:*.au=01;36:*.flac=01;36:*.mid=01;36:*.midi=01;36:*.mka=01;36:*.mp3=01;36:*.mpc=01;36:*.ogg=01;36:*.ra=01;36:*.wav=01;36:*.axa=01;36:*.oga=01;36:*.spx=01;36:*.xspf=01;36:
MAIL=/var/spool/mail/aag193
MAKEFLAGS=-j
MANPATH=/opt/sw/admin/lmod/lmod/share/man:/usr/lpp/mmfs/share/man:
MODULEPATH=/opt/sw/modulefiles/Linux:/opt/sw/modulefiles/Core:/opt/sw/admin/lmod/lmod/modulefiles/Core
MODULEPATH_ROOT=/opt/sw/modulefiles
MODULESHOME=/opt/sw/admin/lmod/lmod
PATH=/home/aag193/anaconda3/bin:/home/aag193/anaconda3/bin:/usr/lpp/mmfs/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/aag193/.local/bin:/home/aag193/bin:/home/aag193/.local/bin:/home/aag193/bin
PS1=#
PWD=/cache/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/pilot.0000
PYTHONNOUSERSITE=True
RADICAL_PROFILE=TRUE
RP_BOOTSTRAP_0_REDIR=True
SHELL=/bin/bash
SHLVL=3
SLURM_CHECKPOINT_IMAGE_DIR=/var/lib/slurm/checkpoint
SLURM_CLUSTER_NAME=amarel
SLURM_CPUS_ON_NODE=2
SLURM_CPUS_PER_TASK=1
SLURMD_NODENAME=slepner034
SLURM_GTIDS=0
SLURM_JOB_ACCOUNT=general
SLURM_JOB_CPUS_PER_NODE=2
SLURM_JOB_GID=140535
SLURM_JOB_ID=95175965
SLURM_JOBID=95175965
SLURM_JOB_NAME=pilot.0000
SLURM_JOB_NODELIST=slepner034
SLURM_JOB_NUM_NODES=1
SLURM_JOB_PARTITION=main
SLURM_JOB_QOS=normal
SLURM_JOB_UID=140535
SLURM_JOB_USER=aag193
SLURM_LOCALID=0
SLURM_MEM_PER_CPU=4096
SLURM_NNODES=1
SLURM_NODE_ALIASES=(null)
SLURM_NODEID=0
SLURM_NODELIST=slepner034
SLURM_NPROCS=2
SLURM_NTASKS=2
SLURM_PRIO_PROCESS=0
SLURM_PROCID=0
SLURM_SUBMIT_DIR=/cache/home/aag193
SLURM_SUBMIT_HOST=amarel1.amarel.rutgers.edu
SLURM_TASK_PID=37851
SLURM_TASKS_PER_NODE=2
SLURM_TOPOLOGY_ADDR=fdrc[0-1].edge2.slepner034
SLURM_TOPOLOGY_ADDR_PATTERN=switch.switch.node
SLURM_WORKING_CLUSTER=amarel:saul:6808:8448
SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass
SSH_CLIENT=172.31.142.101 63876 22
SSH_CONNECTION=172.31.142.101 63876 172.16.94.35 22
SSH_TTY=/dev/pts/182
TERM=vt100
TMPDIR=/tmp
USER=aag193
_=/usr/bin/env
XDG_DATA_DIRS=/home/aag193/.local/share/flatpak/exports/share:/var/lib/flatpak/exports/share:/usr/local/share:/usr/share
XDG_RUNTIME_DIR=/run/user/140535
XDG_SESSION_ID=52411
# -------------------------------------------------------------------

# -------------------------------------------------------------------
#
# Running pre_bootstrap_0 command
# cmd: module load intel/17.0.4
#
#
# SUCCESS
#
# -------------------------------------------------------------------

# -------------------------------------------------------------------
#
# Running pre_bootstrap_0 command
# cmd: module load python/3.5.2
#
#
# SUCCESS
#
# -------------------------------------------------------------------
# -------------------------------------------------------------------
# Touching output tarballs
# -------------------------------------------------------------------
create gtod
++ ./gtod
+ TIME_ZERO=1582740496.708426
+ export TIME_ZERO
+ set +x
+ test -z TRUE
+ PROFILE=bootstrap_0.prof
+ event=bootstrap_0_start
+ msg=
++ ./gtod
+ epoch=1582740496.759094
++ awk 'BEGIN{print(1582740496.759094 - 1582740496.708426)}'
+ now=0.050668
+ test -f bootstrap_0.prof
+ echo '#time,name,uid,state,event,msg'
+ printf '%.4f,%s,%s,%s,%s,%s,%s\n' 0.050668 bootstrap_0_start bootstrap_0 MainThread pilot.0000 PMGR_ACTIVE_PENDING ''
+ tee -a bootstrap_0.prof
0.0507,bootstrap_0_start,bootstrap_0,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
+ set +x
VIRTENV : /home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1
VIRTENV : /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1 (normalized)
PYTHON: /opt/sw/packages/gcc-4.8/python/3.5.2/bin/python
PIP   : /opt/sw/packages/gcc-4.8/python/3.5.2/bin/pip
+ test -z TRUE
+ PROFILE=bootstrap_0.prof
+ event=ve_setup_start
+ msg=
++ ./gtod
+ epoch=1582740496.845515
++ awk 'BEGIN{print(1582740496.845515 - 1582740496.708426)}'
+ now=0.137089
+ test -f bootstrap_0.prof
+ printf '%.4f,%s,%s,%s,%s,%s,%s\n' 0.137089 ve_setup_start bootstrap_0 MainThread pilot.0000 PMGR_ACTIVE_PENDING ''
+ tee -a bootstrap_0.prof
0.1371,ve_setup_start,bootstrap_0,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
+ set +x
virtenv_create   : TRUE
virtenv_update   : FALSE
rp install sources:  radical.utils-1.1.1/ radical.saga-1.1.2/ radical.pilot-1.1.1/
rp install target : SANDBOX
rp install lock   : FALSE
rp lock for ve create
obtained lock /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1.lock
+ test -z TRUE
+ PROFILE=bootstrap_0.prof
+ event=ve_create_start
+ msg=
++ ./gtod
+ epoch=1582740498.491163
++ awk 'BEGIN{print(1582740498.491163 - 1582740496.708426)}'
+ now=1.78274
+ test -f bootstrap_0.prof
+ printf '%.4f,%s,%s,%s,%s,%s,%s\n' 1.78274 ve_create_start bootstrap_0 MainThread pilot.0000 PMGR_ACTIVE_PENDING ''
+ tee -a bootstrap_0.prof
1.7827,ve_create_start,bootstrap_0,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
+ set +x

# -------------------------------------------------------------------
#
# Download virtualenv tgz
# cmd: curl -1 -k -L -O 'https://files.pythonhosted.org/packages/66/f0/6867af06d2e2f511e4e1d7094ff663acdebc4f15d4a0cb0fed1007395124/virtualenv-16.7.5.tar.gz'
#
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 4992k  100 4992k    0     0  10.0M      0 --:--:-- --:--:-- --:--:-- 10.0M
#
# SUCCESS
#
# -------------------------------------------------------------------

# -------------------------------------------------------------------
#
# unpacking virtualenv tgz
# cmd: tar zxmf 'virtualenv-16.7.5.tar.gz'
#
#
# SUCCESS
#
# -------------------------------------------------------------------

# -------------------------------------------------------------------
#
# Create virtualenv
# cmd: /opt/sw/packages/gcc-4.8/python/3.5.2/bin/python virtualenv-16.7.5/virtualenv.py -p python3 /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1
#
Already using interpreter /opt/sw/packages/gcc-4.8/python/3.5.2/bin/python3
Using base prefix '/opt/sw/packages/gcc-4.8/python/3.5.2'
New python executable in /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/python3
Also creating executable in /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/python
Installing setuptools, pip, wheel...
done.
Running virtualenv with interpreter /opt/sw/packages/gcc-4.8/python/3.5.2/bin/python3
#
# SUCCESS
#
# -------------------------------------------------------------------
+ test -z TRUE
+ PROFILE=bootstrap_0.prof
+ event=ve_activate_start
+ msg=
++ ./gtod
+ epoch=1582740508.094744
++ awk 'BEGIN{print(1582740508.094744 - 1582740496.708426)}'
+ now=11.3863
+ test -f bootstrap_0.prof
+ printf '%.4f,%s,%s,%s,%s,%s,%s\n' 11.3863 ve_activate_start bootstrap_0 MainThread pilot.0000 PMGR_ACTIVE_PENDING ''
+ tee -a bootstrap_0.prof
11.3863,ve_activate_start,bootstrap_0,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
+ set +x
PYTHON: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/python
PIP   : /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/pip
PYTHON INTERPRETER: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/python
PYTHON_VERSION    : 3.5
VE_MOD_PREFIX     : /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/lib/python3.5/site-packages
PIP installer     : /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/pip
PIP version       : pip 20.0.2 from /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/lib/python3.5/site-packages/pip (python 3.5)
/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/bootstrap_0.sh: line 878: export: `/cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin:/cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin:/opt/sw/packages/gcc-4.8/python/3.5.2/bin:/opt/sw/packages/intel/17.0.4/compilers_and_libraries/linux/bin/intel64:/opt/sw/packages/intel/17.0.4/compilers_and_libraries/linux/mpi/intel64/bin:/opt/sw/packages/intel/17.0.4/debugger_2017/gdb/intel64_mic/bin:/home/aag193/anaconda3/bin:/usr/lpp/mmfs/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/home/aag193/.local/bin:/home/aag193/bin': not a valid identifier
activated virtenv
VIRTENV      : /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1
VE_PYTHONPATH: 
VE_MOD_PREFIX: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/lib/python3.5/site-packages
RP_MOD_PREFIX: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/rp_install/lib/python3.5/site-packages
RP_PATH      : /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin
+ test -z TRUE
+ PROFILE=bootstrap_0.prof
+ event=ve_activate_stop
+ msg=
++ ./gtod
+ epoch=1582740508.599676
++ awk 'BEGIN{print(1582740508.599676 - 1582740496.708426)}'
+ now=11.8912
+ test -f bootstrap_0.prof
+ printf '%.4f,%s,%s,%s,%s,%s,%s\n' 11.8912 ve_activate_stop bootstrap_0 MainThread pilot.0000 PMGR_ACTIVE_PENDING ''
+ tee -a bootstrap_0.prof
11.8912,ve_activate_stop,bootstrap_0,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
+ set +x
PYTHON: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/python
PIP   : /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/pip

# -------------------------------------------------------------------
#
# install pymongo
# cmd: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/pip --no-cache-dir install --no-build-isolation pymongo
#
Collecting pymongo
  Downloading pymongo-3.10.1-cp35-cp35m-manylinux2014_x86_64.whl (459 kB)
Installing collected packages: pymongo
Successfully installed pymongo-3.10.1
#
# SUCCESS
#
# -------------------------------------------------------------------

# -------------------------------------------------------------------
#
# install colorama
# cmd: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/pip --no-cache-dir install --no-build-isolation colorama
#
Collecting colorama
  Downloading colorama-0.4.3-py2.py3-none-any.whl (15 kB)
Installing collected packages: colorama
Successfully installed colorama-0.4.3
#
# SUCCESS
#
# -------------------------------------------------------------------

# -------------------------------------------------------------------
#
# install python-hostlist
# cmd: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/pip --no-cache-dir install --no-build-isolation python-hostlist
#
Collecting python-hostlist
  Downloading python-hostlist-1.20.tar.gz (35 kB)
Building wheels for collected packages: python-hostlist
  Building wheel for python-hostlist (setup.py): started
  Building wheel for python-hostlist (setup.py): finished with status 'done'
  Created wheel for python-hostlist: filename=python_hostlist-1.20-py3-none-any.whl size=38932 sha256=1ecaa08dddd6a69d6154408f2d5da4b84cbb947f8ba3bcf36e3d472fc855789f
  Stored in directory: /tmp/pip-ephem-wheel-cache-r_8obzq6/wheels/aa/d8/53/ace748a3ae3fd460e6aaca884296ac3312d767f4d424a56ece
Successfully built python-hostlist
Installing collected packages: python-hostlist
Successfully installed python-hostlist-1.20
#
# SUCCESS
#
# -------------------------------------------------------------------

# -------------------------------------------------------------------
#
# install ntplib
# cmd: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/pip --no-cache-dir install --no-build-isolation ntplib
#
Collecting ntplib
  Downloading ntplib-0.3.3.tar.gz (6.8 kB)
Building wheels for collected packages: ntplib
  Building wheel for ntplib (setup.py): started
  Building wheel for ntplib (setup.py): finished with status 'done'
  Created wheel for ntplib: filename=ntplib-0.3.3-py3-none-any.whl size=5909 sha256=4bfc3b58aea1c2f19245663807be6f56e77677a57d0385399821a588a3a2d512
  Stored in directory: /tmp/pip-ephem-wheel-cache-775bwt7s/wheels/d7/72/2e/5d5aa67dd62f46f1d017c1f285fcf3dda579f3aefd37108d6e
Successfully built ntplib
Installing collected packages: ntplib
Successfully installed ntplib-0.3.3
#
# SUCCESS
#
# -------------------------------------------------------------------

# -------------------------------------------------------------------
#
# install pyzmq
# cmd: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/pip --no-cache-dir install --no-build-isolation pyzmq
#
Collecting pyzmq
  Downloading pyzmq-19.0.0-cp35-cp35m-manylinux1_x86_64.whl (1.1 MB)
Installing collected packages: pyzmq
Successfully installed pyzmq-19.0.0
#
# SUCCESS
#
# -------------------------------------------------------------------

# -------------------------------------------------------------------
#
# install netifaces
# cmd: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/pip --no-cache-dir install --no-build-isolation netifaces
#
Collecting netifaces
  Downloading netifaces-0.10.9-cp35-cp35m-manylinux1_x86_64.whl (32 kB)
Installing collected packages: netifaces
Successfully installed netifaces-0.10.9
#
# SUCCESS
#
# -------------------------------------------------------------------

# -------------------------------------------------------------------
#
# install setproctitle
# cmd: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/pip --no-cache-dir install --no-build-isolation setproctitle
#
Collecting setproctitle
  Downloading setproctitle-1.1.10.tar.gz (24 kB)
Building wheels for collected packages: setproctitle
  Building wheel for setproctitle (setup.py): started
  Building wheel for setproctitle (setup.py): finished with status 'error'
  ERROR: Command errored out with exit status 1:
   command: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-5wo5b6s9/setproctitle/setup.py'"'"'; __file__='"'"'/tmp/pip-install-5wo5b6s9/setproctitle/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-mc2xbenb
       cwd: /tmp/pip-install-5wo5b6s9/setproctitle/
  Complete output (10 lines):
  running bdist_wheel
  running build
  running build_ext
  building 'setproctitle' extension
  creating build
  creating build/temp.linux-x86_64-3.5
  creating build/temp.linux-x86_64-3.5/src
  gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DHAVE_SYS_PRCTL_H=1 -DSPT_VERSION=1.1.10 -I/opt/sw/packages/gcc-4.8/python/3.5.2/include/python3.5m -c src/setproctitle.c -o build/temp.linux-x86_64-3.5/src/setproctitle.o
  unable to execute 'gcc': No such file or directory
  error: command 'gcc' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for setproctitle
  Running setup.py clean for setproctitle
Failed to build setproctitle
Installing collected packages: setproctitle
    Running setup.py install for setproctitle: started
    Running setup.py install for setproctitle: finished with status 'error'
    ERROR: Command errored out with exit status 1:
     command: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-5wo5b6s9/setproctitle/setup.py'"'"'; __file__='"'"'/tmp/pip-install-5wo5b6s9/setproctitle/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-sembtny_/install-record.txt --single-version-externally-managed --compile --install-headers /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/include/site/python3.5/setproctitle
         cwd: /tmp/pip-install-5wo5b6s9/setproctitle/
    Complete output (10 lines):
    running install
    running build
    running build_ext
    building 'setproctitle' extension
    creating build
    creating build/temp.linux-x86_64-3.5
    creating build/temp.linux-x86_64-3.5/src
    gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DSPT_VERSION=1.1.10 -DHAVE_SYS_PRCTL_H=1 -I/opt/sw/packages/gcc-4.8/python/3.5.2/include/python3.5m -c src/setproctitle.c -o build/temp.linux-x86_64-3.5/src/setproctitle.o
    unable to execute 'gcc': No such file or directory
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-5wo5b6s9/setproctitle/setup.py'"'"'; __file__='"'"'/tmp/pip-install-5wo5b6s9/setproctitle/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-sembtny_/install-record.txt --single-version-externally-managed --compile --install-headers /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/include/site/python3.5/setproctitle Check the logs for full command output.
#
# ERROR
# no fallback command available
#
# -------------------------------------------------------------------
Couldn't install setproctitle! Lets see how far we get ...

# -------------------------------------------------------------------
#
# install msgpack
# cmd: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/pip --no-cache-dir install --no-build-isolation msgpack
#
Collecting msgpack
  Downloading msgpack-1.0.0-cp35-cp35m-manylinux1_x86_64.whl (270 kB)
Installing collected packages: msgpack
Successfully installed msgpack-1.0.0
#
# SUCCESS
#
# -------------------------------------------------------------------

# -------------------------------------------------------------------
#
# install future
# cmd: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/pip --no-cache-dir install --no-build-isolation future
#
Collecting future
  Downloading future-0.18.2.tar.gz (829 kB)
Building wheels for collected packages: future
  Building wheel for future (setup.py): started
  Building wheel for future (setup.py): finished with status 'done'
  Created wheel for future: filename=future-0.18.2-py3-none-any.whl size=491058 sha256=569ddd1263b98a1195ade42307033c5968f997554a4cd0be2d56e2029d5bc006
  Stored in directory: /tmp/pip-ephem-wheel-cache-w0n6hm0d/wheels/c4/f0/ae/d4689c4532d1f111462ed6a884a7767d502e511ee65f0d8e1b
Successfully built future
Installing collected packages: future
Successfully installed future-0.18.2
#
# SUCCESS
#
# -------------------------------------------------------------------

# -------------------------------------------------------------------
#
# install regex
# cmd: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/pip --no-cache-dir install --no-build-isolation regex
#
Collecting regex
  Downloading regex-2020.2.20.tar.gz (681 kB)
Building wheels for collected packages: regex
  Building wheel for regex (setup.py): started
  Building wheel for regex (setup.py): finished with status 'error'
  ERROR: Command errored out with exit status 1:
   command: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-h4u34czw/regex/setup.py'"'"'; __file__='"'"'/tmp/pip-install-h4u34czw/regex/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-v95siaj6
       cwd: /tmp/pip-install-h4u34czw/regex/
  Complete output (17 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.5
  creating build/lib.linux-x86_64-3.5/regex
  copying regex_3/__init__.py -> build/lib.linux-x86_64-3.5/regex
  copying regex_3/regex.py -> build/lib.linux-x86_64-3.5/regex
  copying regex_3/_regex_core.py -> build/lib.linux-x86_64-3.5/regex
  copying regex_3/test_regex.py -> build/lib.linux-x86_64-3.5/regex
  running build_ext
  building 'regex._regex' extension
  creating build/temp.linux-x86_64-3.5
  creating build/temp.linux-x86_64-3.5/regex_3
  gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/sw/packages/gcc-4.8/python/3.5.2/include/python3.5m -c regex_3/_regex.c -o build/temp.linux-x86_64-3.5/regex_3/_regex.o
  unable to execute 'gcc': No such file or directory
  error: command 'gcc' failed with exit status 1
  ----------------------------------------
  ERROR: Failed building wheel for regex
  Running setup.py clean for regex
Failed to build regex
Installing collected packages: regex
    Running setup.py install for regex: started
    Running setup.py install for regex: finished with status 'error'
    ERROR: Command errored out with exit status 1:
     command: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-h4u34czw/regex/setup.py'"'"'; __file__='"'"'/tmp/pip-install-h4u34czw/regex/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-sw34taeq/install-record.txt --single-version-externally-managed --compile --install-headers /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/include/site/python3.5/regex
         cwd: /tmp/pip-install-h4u34czw/regex/
    Complete output (17 lines):
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.5
    creating build/lib.linux-x86_64-3.5/regex
    copying regex_3/__init__.py -> build/lib.linux-x86_64-3.5/regex
    copying regex_3/regex.py -> build/lib.linux-x86_64-3.5/regex
    copying regex_3/_regex_core.py -> build/lib.linux-x86_64-3.5/regex
    copying regex_3/test_regex.py -> build/lib.linux-x86_64-3.5/regex
    running build_ext
    building 'regex._regex' extension
    creating build/temp.linux-x86_64-3.5
    creating build/temp.linux-x86_64-3.5/regex_3
    gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/opt/sw/packages/gcc-4.8/python/3.5.2/include/python3.5m -c regex_3/_regex.c -o build/temp.linux-x86_64-3.5/regex_3/_regex.o
    unable to execute 'gcc': No such file or directory
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/python3 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-h4u34czw/regex/setup.py'"'"'; __file__='"'"'/tmp/pip-install-h4u34czw/regex/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-sw34taeq/install-record.txt --single-version-externally-managed --compile --install-headers /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/include/site/python3.5/regex Check the logs for full command output.
#
# ERROR
# no fallback command available
#
# -------------------------------------------------------------------
Couldn't install regex! Lets see how far we get ...

# -------------------------------------------------------------------
#
# install munch
# cmd: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/pip --no-cache-dir install --no-build-isolation munch
#
Collecting munch
  Downloading munch-2.5.0-py2.py3-none-any.whl (10 kB)
Collecting six
  Downloading six-1.14.0-py2.py3-none-any.whl (10 kB)
Installing collected packages: six, munch
Successfully installed munch-2.5.0 six-1.14.0
#
# SUCCESS
#
# -------------------------------------------------------------------
+ test -z TRUE
+ PROFILE=bootstrap_0.prof
+ event=ve_create_stop
+ msg=
++ ./gtod
+ epoch=1582740529.117960
++ awk 'BEGIN{print(1582740529.117960 - 1582740496.708426)}'
+ now=32.4095
+ test -f bootstrap_0.prof
+ printf '%.4f,%s,%s,%s,%s,%s,%s\n' 32.4095 ve_create_stop bootstrap_0 MainThread pilot.0000 PMGR_ACTIVE_PENDING ''
+ tee -a bootstrap_0.prof
32.4095,ve_create_stop,bootstrap_0,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
+ set +x
removed ‘/cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1.lock’
+ test -z TRUE
+ PROFILE=bootstrap_0.prof
+ event=ve_activate_start
+ msg=
++ ./gtod
+ epoch=1582740529.172846
++ awk 'BEGIN{print(1582740529.172846 - 1582740496.708426)}'
+ now=32.4644
+ test -f bootstrap_0.prof
+ printf '%.4f,%s,%s,%s,%s,%s,%s\n' 32.4644 ve_activate_start bootstrap_0 MainThread pilot.0000 PMGR_ACTIVE_PENDING ''
+ tee -a bootstrap_0.prof
32.4644,ve_activate_start,bootstrap_0,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
+ set +x
do not update virtenv /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1
+ test -z TRUE
+ PROFILE=bootstrap_0.prof
+ event=rp_install_start
+ msg=
++ ./gtod
+ epoch=1582740529.211221
++ awk 'BEGIN{print(1582740529.211221 - 1582740496.708426)}'
+ now=32.5028
+ test -f bootstrap_0.prof
+ printf '%.4f,%s,%s,%s,%s,%s,%s\n' 32.5028 rp_install_start bootstrap_0 MainThread pilot.0000 PMGR_ACTIVE_PENDING ''
+ tee -a bootstrap_0.prof
32.5028,rp_install_start,bootstrap_0,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
+ set +x
Using RADICAL-Pilot install sources ' radical.utils-1.1.1/ radical.saga-1.1.2/ radical.pilot-1.1.1/'
VE_MOD_PREFIX: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/lib/python3.5/site-packages
VIRTENV      : /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1
SANDBOX      : /cache/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/pilot.0000
VE_LOC_PREFIX: 
using local install tree
PYTHONPATH: /cache/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/pilot.0000/rp_install/lib/python3.5/site-packages::
rp_install: /cache/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/pilot.0000/rp_install/lib/python3.5/site-packages
radicalmod: /cache/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/pilot.0000/rp_install/lib/python3.5/site-packages/radical/

# -------------------------------------------------------------------
#
# update radical.utils-1.1.1/ via pip
# cmd: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/pip install  --src '/cache/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/pilot.0000/rp_install/src' --build '/cache/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/pilot.0000/rp_install/build' --install-option='--prefix=/cache/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/pilot.0000/rp_install' --no-deps --no-cache-dir --no-build-isolation radical.utils-1.1.1/
#
/cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/lib/python3.5/site-packages/pip/_internal/commands/install.py:244: UserWarning: Disabling all use of wheels due to the use of --build-option / --global-option / --install-option.
  cmdoptions.check_install_build_global(options)
DEPRECATION: Location-changing options found in --install-option: ['--prefix'] from command line. This configuration may cause unexpected behavior and is unsupported. pip 20.2 will remove support for this functionality. A possible replacement is using pip-level options like --user, --prefix, --root, and --target. You can find discussion regarding this at https://github.com/pypa/pip/issues/7309.
Processing ./radical.utils-1.1.1
Skipping wheel build for radical.utils, due to binaries being disabled for it.
Installing collected packages: radical.utils
    Running setup.py install for radical.utils: started
    Running setup.py install for radical.utils: finished with status 'done'
Successfully installed radical.utils
#
# SUCCESS
#
# -------------------------------------------------------------------
purge install source at radical.utils-1.1.1/

# -------------------------------------------------------------------
#
# update radical.saga-1.1.2/ via pip
# cmd: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/pip install  --src '/cache/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/pilot.0000/rp_install/src' --build '/cache/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/pilot.0000/rp_install/build' --install-option='--prefix=/cache/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/pilot.0000/rp_install' --no-deps --no-cache-dir --no-build-isolation radical.saga-1.1.2/
#
/cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/lib/python3.5/site-packages/pip/_internal/commands/install.py:244: UserWarning: Disabling all use of wheels due to the use of --build-option / --global-option / --install-option.
  cmdoptions.check_install_build_global(options)
DEPRECATION: Location-changing options found in --install-option: ['--prefix'] from command line. This configuration may cause unexpected behavior and is unsupported. pip 20.2 will remove support for this functionality. A possible replacement is using pip-level options like --user, --prefix, --root, and --target. You can find discussion regarding this at https://github.com/pypa/pip/issues/7309.
Processing ./radical.saga-1.1.2
Skipping wheel build for radical.saga, due to binaries being disabled for it.
Installing collected packages: radical.saga
    Running setup.py install for radical.saga: started
    Running setup.py install for radical.saga: finished with status 'done'
Successfully installed radical.saga
#
# SUCCESS
#
# -------------------------------------------------------------------
purge install source at radical.saga-1.1.2/

# -------------------------------------------------------------------
#
# update radical.pilot-1.1.1/ via pip
# cmd: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/pip install  --src '/cache/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/pilot.0000/rp_install/src' --build '/cache/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/pilot.0000/rp_install/build' --install-option='--prefix=/cache/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/pilot.0000/rp_install' --no-deps --no-cache-dir --no-build-isolation radical.pilot-1.1.1/
#
/cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/lib/python3.5/site-packages/pip/_internal/commands/install.py:244: UserWarning: Disabling all use of wheels due to the use of --build-option / --global-option / --install-option.
  cmdoptions.check_install_build_global(options)
DEPRECATION: Location-changing options found in --install-option: ['--prefix'] from command line. This configuration may cause unexpected behavior and is unsupported. pip 20.2 will remove support for this functionality. A possible replacement is using pip-level options like --user, --prefix, --root, and --target. You can find discussion regarding this at https://github.com/pypa/pip/issues/7309.
Processing ./radical.pilot-1.1.1
Skipping wheel build for radical.pilot, due to binaries being disabled for it.
Installing collected packages: radical.pilot
    Running setup.py install for radical.pilot: started
    Running setup.py install for radical.pilot: finished with status 'done'
Successfully installed radical.pilot
#
# SUCCESS
#
# -------------------------------------------------------------------
purge install source at radical.pilot-1.1.1/
+ test -z TRUE
+ PROFILE=bootstrap_0.prof
+ event=rp_install_stop
+ msg=
++ ./gtod
+ epoch=1582740538.442146
++ awk 'BEGIN{print(1582740538.442146 - 1582740496.708426)}'
+ now=41.7337
+ test -f bootstrap_0.prof
+ printf '%.4f,%s,%s,%s,%s,%s,%s\n' 41.7337 rp_install_stop bootstrap_0 MainThread pilot.0000 PMGR_ACTIVE_PENDING ''
+ tee -a bootstrap_0.prof
41.7337,rp_install_stop,bootstrap_0,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
+ set +x
+ test -z TRUE
+ PROFILE=bootstrap_0.prof
+ event=ve_setup_stop
+ msg=
++ ./gtod
+ epoch=1582740538.484094
++ awk 'BEGIN{print(1582740538.484094 - 1582740496.708426)}'
+ now=41.7757
+ test -f bootstrap_0.prof
+ printf '%.4f,%s,%s,%s,%s,%s,%s\n' 41.7757 ve_setup_stop bootstrap_0 MainThread pilot.0000 PMGR_ACTIVE_PENDING ''
+ tee -a bootstrap_0.prof
41.7757,ve_setup_stop,bootstrap_0,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
+ set +x
+ test -z TRUE
+ PROFILE=bootstrap_0.prof
+ event=ve_activate_start
+ msg=
++ ./gtod
+ epoch=1582740538.522479
++ awk 'BEGIN{print(1582740538.522479 - 1582740496.708426)}'
+ now=41.8141
+ test -f bootstrap_0.prof
+ printf '%.4f,%s,%s,%s,%s,%s,%s\n' 41.8141 ve_activate_start bootstrap_0 MainThread pilot.0000 PMGR_ACTIVE_PENDING ''
+ tee -a bootstrap_0.prof
41.8141,ve_activate_start,bootstrap_0,MainThread,pilot.0000,PMGR_ACTIVE_PENDING,
+ set +x
verify python viability: /cache/home/aag193/radical.pilot.sandbox/ve.rutgers.amarel.1.1.1/bin/python ... ok
verify module viability: radical.pilot   ...Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/cache/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/pilot.0000/rp_install/lib/python3.5/site-packages/radical/pilot/__init__.py", line 10, in <module>
    import radical.utils as _ru
  File "/cache/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/pilot.0000/rp_install/lib/python3.5/site-packages/radical/utils/__init__.py", line 14, in <module>
    from .plugin_manager import PluginManager
  File "/cache/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/pilot.0000/rp_install/lib/python3.5/site-packages/radical/utils/plugin_manager.py", line 15, in <module>
    from .logger    import Logger
  File "/cache/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/pilot.0000/rp_install/lib/python3.5/site-packages/radical/utils/logger.py", line 44, in <module>
    from   .misc      import get_env_ns       as ru_get_env_ns
  File "/cache/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/pilot.0000/rp_install/lib/python3.5/site-packages/radical/utils/misc.py", line 16, in <module>
    from .ru_regex import ReString
  File "/cache/home/aag193/radical.pilot.sandbox/rp.session.nbp-142-101.nbp.ruw.rutgers.edu.abdullahghani.018318.0003/pilot.0000/rp_install/lib/python3.5/site-packages/radical/utils/ru_regex.py", line 7, in <module>
    import regex
ImportError: No module named 'regex'
 failed
python installation cannot load module radical.pilot - abort
mtitov commented 4 years ago

AttributeError: module 'signal' has no attribute 'SIGCLD'

Alas, the minor errors are not minor :-P First, it seems Python deprecated a signal name we are still using (passively). I pushed an RU branch hotfix/sigcld_deprecation which gets rid of this error. Your test will still die (but with more stile), because it seems like your pilot got into FAILED state. So either you have an error in the launcher again, or you need to check the pilot sandbox you now use on Amarel to see why it failed.

@andre-merzky I also went through other signals just in case, and see that SIGPOLL, SIGPWR, SIGRTMAX, SIGRTMIN are also deprecated (pls confirm that)

>>> dir(signal.Signals)
['SIGABRT', 'SIGALRM', 'SIGBUS', 'SIGCHLD', 'SIGCONT', 'SIGEMT', 'SIGFPE', 'SIGHUP', 'SIGILL', 'SIGINFO', 'SIGINT', 'SIGIO', 'SIGKILL', 'SIGPIPE', 'SIGPROF', 'SIGQUIT', 'SIGSEGV', 'SIGSTOP', 'SIGSYS', 'SIGTERM', 'SIGTRAP', 'SIGTSTP', 'SIGTTIN', 'SIGTTOU', 'SIGURG', 'SIGUSR1', 'SIGUSR2', 'SIGVTALRM', 'SIGWINCH', 'SIGXCPU', 'SIGXFSZ', '__class__', '__doc__', '__members__', '__module__']
Abdullah-Ghani commented 4 years ago

specifically setproctitle andregex

andre-merzky commented 4 years ago

@mtitov : thanks for checking! But, WTH Python? SIGCLD is not standardized, but SIGPOLL for example is POSIX.2001, as are the SIGRTMIN/MAX?! Pfft... Oh well, either way, would you mind removing those then, too?

@Abdullah-Ghani : if you don't mind, shorten the pasted parts to make the errors clearer - I would not have seen the regex and setproctitle errors... But to those, the message includes:

    unable to execute 'gcc': No such file or directory
    error: command 'gcc' failed with exit status 1

which is likely the exact problem: you current configuration does not have a functional compiler. You likely need a module load gnu or something like that to activate a compiler chain acceptable to your Python module / setup.

mtitov commented 4 years ago

@mtitov : thanks for checking! But, WTH Python? SIGCLD is not standardized, but SIGPOLL for example is POSIX.2001, as are the SIGRTMIN/MAX?! Pfft... Oh well, either way, would you mind removing those then, too?

@andre-merzky python 3.7.6 and yeah, will make an update for these signals as well

Abdullah-Ghani commented 4 years ago

made great progress with @mtitov help. This is the edited config file.

# filename: rutgers.json
{
"amarel":
{
        "description"                 : "The Amarel HPC cluster at Rutgers.",
        "notes"                       : "Access from registered IP address.",
        "schemas"                     : ["ssh","local"],
        "ssh"                         :
        {
            "job_manager_endpoint"    : "slurm+ssh://aag193@amarel.rutgers.edu/",
            "filesystem_endpoint"     : "sftp://aag193@amarel.rutgers.edu/home/aag193/"
        },
        "local"                       :
        {
            "job_manager_endpoint"    : "slurm://amarel.rutgers.edu:/home/aag193/",
            "filesystem_endpoint"     : "file://amarel.rutgers.edu:/home/aag193/"
        },
        "default_queue"               : "main",
        "resource_manager"            : "SLURM",
        "agent_scheduler"             : "CONTINUOUS",
        "agent_spawner"               : "POPEN",
        "agent_launch_method"         : "SSH",
        "task_launch_method"          : "SSH",
        "mpi_launch_method"           : "SRUN",
        "pre_bootstrap_0"             :["module load gcc/5.4",
                                        "module load python/3.5.2",
                                        "module load intel/17.0.4"
                                        ],

        "default_remote_workdir"      : "/home/aag193",
        "valid_roots"                 : ["/scratch", "$SCRATCH", "/home", "$HOME"],
        "rp_version"                  : "local",
        "virtenv_mode"                : "update",
        "virtenv_dist"                : "default",
        "python_dist"                 : "default",
        "export_to_cu"                : ["LMOD_CMD",
                                         "LMOD_SYSTEM_DEFAULT_MODULES",
                                         "LD_LIBRARY_PATH"],
        "cu_pre_exec"                 : ["module restore"]
    }
}

Scripts runs completely with no problem.

However getting some permission problems on executing units.

aag193@amarel1 pilot.0000]$ cd unit.000000/
aag193@amarel1 unit.000000]$ ls
STDERR  STDOUT  unit.000000.sh
[aag193@amarel1 unit.000000]$ cat STDERR 
Warning: Permanently added 'slepner012,192.168.8.12' (ECDSA) to the list of known hosts.
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password).
[aag193@amarel1 unit.000000]$ pwd 
/home/aag193/radical.pilot.sandbox/rp.session.vpn-client-172-16-9-130.rutgers.edu.abdullahghani.018320.0002/pilot.0000/unit.000000
andre-merzky commented 4 years ago

Can you please try to setup ssh so that login to localhost (i.e., the login node) works w/o password? You are setting agent_launch_method and task_launch_method to ssh, and I assume that this stumbles over the default configuration on amarel. On Slurm, you can also try to use SRUN for both settings.

Abdullah-Ghani commented 4 years ago

That works. Thank you

mtitov commented 4 years ago

@Abdullah-Ghani @andre-merzky should we just provide a summary for this issue? (since it was resolved and this ticket can be closed after the summary?)

andre-merzky commented 4 years ago

@Abdullah-Ghani : would you mind providing that summary (short)? Thanks!

Abdullah-Ghani commented 4 years ago

Yes. Ofcourse :)

*If you are situated off campus you would need to install the pulse secure/Cisco AnyConnect VPN as Amarel only allows connection from RU-Wireless network.

  1. Make sure you have passwordless SSH access set up to connect to amarel.

  2. Place the config file I have attached below in the following directory. ~/.radical/pilot/configs/resource_rutgers.json (If any of those directories do not exist, make sure you create them)

  3. Change the resource description in the EnTK/Pilot script as such

    # Apply the resource configuration provided by the user
    res_dict = {
        'resource': 'rutgers.amarel',
        'walltime': 10,
        'cpus': 2,
        'schema': 'local'}
    
    # Assign resource request description to the Application Manager
    amgr.resource_desc = res_dict

    *You can change the allocations to suit your needs.

  4. Run the script. That should be it.

# filename: rutgers.json
{
"amarel": 
{
        "description"                 : "The Amarel HPC cluster at Rutgers.",
        "notes"                       : "Access from registered IP address.",
        "schemas"                     : ["ssh","local"],
        "ssh"                         :
        {
            "job_manager_endpoint"    : "slurm+ssh://amarel.rutgers.edu/",
            "filesystem_endpoint"     : "sftp://amarel.rutgers.edu/"
        },
        "local"                       :
        {
            "job_manager_endpoint"    : "slurm://amarel.rutgers.edu:/home/",
            "filesystem_endpoint"     : "file://amarel.rutgers.edu:/home/"
        },
        "default_queue"               : "main",
        "resource_manager"            : "SLURM",
        "agent_scheduler"             : "CONTINUOUS",
        "agent_spawner"               : "POPEN",
        "agent_launch_method"         : "SRUN",
        "task_launch_method"          : "SRUN",
        "mpi_launch_method"           : "SRUN",
        "pre_bootstrap_0"             :["export MODULEPATH=$MODULEPATH:/projects/community/modulefiles","module load gcc/5.4",

                                        "module load py-data-science-stack/5.1.0-kp807",                   
                                        "module load intel/17.0.4"
                                        ],

        "default_remote_workdir"      : "$HOME",
        "valid_roots"                 : ["/scratch", "$SCRATCH", "/home", "$HOME"],
        "rp_version"                  : "local",
        "virtenv_mode"                : "create",
        "virtenv_dist"                : "default",
        "python_dist"                 : "default",
        "export_to_cu"                : ["LMOD_CMD",
                                         "LMOD_SYSTEM_DEFAULT_MODULES",
                                         "LD_LIBRARY_PATH"],
        "cu_pre_exec"                 : []
    }
}