Closed arupcsedu closed 9 months ago
Hi @arupcsedu, the issue was in radical.utils
(RU) package, but thankfully that was fixed in the latest release
File "/scratch/djy8hg/radical.pilot.sandbox/rp.session.udc-ba35-36.djy8hg.019454.0003/pilot.0000/rp_install/lib/python3.11/site-packages/radical/utils/signatures.py", line 175, in <module>
from inspect import getargspec, isclass
ImportError: cannot import name 'getargspec' from 'inspect' (/apps/software/standard/mpi/gcc/11.2.0/openmpi/4.1.4/python/3.11.1/lib/python3.11/inspect.py)
failed
please update RU package with version 1.22
Note that my tests of #2855 involved a more recent version of radical.utils than your radical-stack is reporting (but by now I can't remember why...). Have you tried the devel branch?
Note that my tests of #2855 involved a more recent version of radical.utils than your radical-stack is reporting (but by now I can't remember why...). Have you tried the devel branch?
Oops. I was too slow. :-)
With the devel branch, I am getting this error, although I added the below code in example/config.json
"uva.rivanna" : { "project" : null, "queue" : "standard", "schema" : "local", "cores" : 8 }
Added the resource_uva.json in the config folder as well.
These is the error:
finalize
closing session rp.session.udc-ba35-36.djy8hg.019454.0009 \
close pilot manager \
wait for 0 pilot(s)
0 ok
ok
session lifetime: 13.8s ok
Traceback (most recent call last):
File "./09_mpi_tasks.py", line 75, in
@arupcsedu sorry for a confusion, Eric meant development branch for radical.utils
only (but actually that devel updates were released in 1.22), thus you can reinitiate your stack
pip uninstall radical.pilot radical.saga radical.utils -y
pip install git+https://github.com/eirrgang/radical.pilot.git@2835-rivanna
@arupcsedu sorry for a confusion, Eric meant development branch for
radical.utils
only (but actually that devel updates were released in 1.22), thus you can reinitiate your stackpip uninstall radical.pilot radical.saga radical.utils -y pip install git+https://github.com/eirrgang/radical.pilot.git@2835-rivanna
Yes, I'm sorry for contributing confusion. I agree completely with @mtitov
@mtitov and @eirrgang, Thank you, guys.
The first update was basically taken by executing the command you shared: pip install git+https://github.com/eirrgang/radical.pilot.git@2835-rivanna But I will cross check again and get back to you.
Hey Guys, After updating the devel, the same error. Have a look the logs.
gather results
wait :
hi @arupcsedu , thank you for logs, seems like this issue is related to the race condition in SAGA component, can you please try this SAGA branch hotfix/slurm_js_jobs
?
pip uninstall radical.saga -y
pip install git+https://github.com/radical-cybertools/radical.saga.git@hotfix/slurm_js_jobs
Still, I am getting the same.
Let me share a bit more information. This operation creates a job on rivanna slurm and It is still in Queued state. Have a look the screenshot and logs.
gather results
wait :
rp.session.udc-ba34-36.djy8hg.019457.0001.zip
I think it is better to wait for getting access to Rivanna for you guys. Do you guys have access, already?
@arupcsedu It seems that your batch job was submitted successfully, and it is in a queue (that's what your screenshot shows), thus RP application will proceed further after batch job starts the execution (will change its state to RUNNING
)
I can look into your client sandbox as well (directory with the session ID name in your current working directory).
Do you guys have access, already?
not yet, but some of us will get access soon (@AymenFJA)
closing as outdated
Hey guys, I was having this issue for quite some time.
Execution is stalled for more than 10 minutes. Please check the system information: (rc_arup) -bash-4.2$radical-stack python : /home/djy8hg/.conda/envs/rc_arup/bin/python3 pythonpath : version : 3.9.16 virtualenv : rc_arup radical.gtod : 1.20.0 radical.pilot : 1.22.0-v1.4.0-4456-gff2e45f@2835-rivanna radical.saga : 1.23.0-v1.22.0-1-g1e21463@devel radical.utils : 1.21.0
I have checked the status of the PR https://github.com/radical-cybertools/radical.pilot/pull/2855
It seemed, interactive, and ssh testing are pending. Have a look at the below screenshot.
Attached the sandbox log as well.
rp.session.udc-ba35-36.djy8hg.019454.0002.zip rp.session.udc-ba35-36.djy8hg.019454.0003.zip