radical-collaboration / CyberManufacturing

CDSE Multi-scale CI Project
1 stars 0 forks source link

RP does not cancel / complete job on node even after it has completed execution #28

Open csampat opened 6 years ago

csampat commented 6 years ago

So the client side indicates that all the CUs have completed and it calculates the total time required but this does not show on stampede2. The jobs are still running on the node. The jobs completed in about 12 - 13 hours but on a showq -u the older jobs are still executing and have reached 16 hours of execution time

iparask commented 6 years ago

This is an RP issue. Specifically #1468 .

Please cancel your jobs for now. I will inform you for the next steps

iparask commented 6 years ago

This is also a duplicate of #26.

iparask commented 6 years ago

Please clone radical.pilot, radical.utils and saga-python repos, if you haven't cloned them already.

With a python virtualenv enabled do:

cd <PATH>/radical.utils
git checkout rc/v0.46.3
pip install . --upgrade
cd <PATH>/saga-python
git checkout rc/v0.46.3
pip install . --upgrade
cd <PATH>/radical.pilot
git checkout experiment/cybermanufacturing
pip install . --upgrade

After installing everything the radical-stack should look like:

  python               : 2.7.14
  pythonpath           :
  virtualenv           : RpCyberExp

  radical.pilot        : 0.47-v0.46.2-186-g2648ca4@experiment-cybermanufacturing
  radical.utils        : 0.47-v0.46-63-gc1ae8ac@rc-v0.46.3
  saga                 : 0.47-v0.46-20-g8ea2302@rc-v0.46.3

And give it a try

csampat commented 6 years ago

Okay I did all that but my radical-stack

Successfully installed saga-python-0.46.1
(rp_2) chai@xcalibur:~/Documents/git/saga-python$ radical-stack

  python               : 2.7.13
  pythonpath           :
  virtualenv           : /home/chai/Documents/git/rp_fix/src/RADICAL_Pilot/rp_2

  radical.pilot        : 0.47-v0.46.2-18-ge0355d21@fix-ibrun_cpn
  radical.utils        : 0.47-v0.46-10-gc515db1@devel
  saga                 : 0.47-v0.46-5-g74fc3811@devel

(rp_2) chai@xcalibur:~/Documents/git/saga-python$ python -c "import saga;print saga.version_detail"
0.46.1-v0.46-1-gabcd7b68@rc-v0.46.3
(rp_2) chai@xcalibur:~/Documents/git/saga-python$ python -c "import radical.utils;print radical.utils.version_detail"
0.46.2-v0.46-4-gbac8d67@rc-v0.46.3
(rp_2) chai@xcalibur:~/Documents/git/saga-python$ python -c "import radical.pilot;print radical.pilot.version_detail"
0.47-v0.46.2-186-g2648ca47@experiment-cybermanufacturing
(rp_2) chai@xcalibur:~/Documents/git/saga-python$
csampat commented 6 years ago

okay today morning I did a fresh clone and reinstalled, but I guess now I am a few commits ahead of the versions you have. New radical-stack:

(rp_2) chai@xcalibur:~/Documents/git/saga-python$ radical-stack 

  python               : 2.7.13
  pythonpath           : 
  virtualenv           : /home/chai/Documents/git/rp_fix/src/RADICAL_Pilot/rp_2

  radical.pilot        : 0.47-v0.46.2-186-g2648ca47@experiment-cybermanufacturing
  radical.utils        : 0.47-v0.46-73-gd580ab1@rc-v0.46.3
  saga                 : 0.47-v0.46-32-ga2f9dedc@rc-v0.46.3
iparask commented 6 years ago

It looks okay!