Closed JMGilbert closed 6 months ago
Thanks Jonah - we got surprised by that break also. Setuptools changed the naming schema for sdist packages, and that broke our setup. I just pushed out new releases (1.52.0) for the whole RCT stack which should resolve the problem, could you please give it a spin and report back? Thanks!
Thanks for the quick response! The installation is now working, but I'm having a separate issue with the newest versions of the radical packages. When I try to run FACTS, it fails out instantly with the error:
Traceback (most recent call last):
File "/home/jonahmgilbert/miniconda3/envs/facts/lib/python3.11/site-packages/radical/entk/appman/appmanager.py", line 459, in run
self._rmgr.submit_resource_request()
File "/home/jonahmgilbert/miniconda3/envs/facts/lib/python3.11/site-packages/radical/entk/execman/rp/resource_manager.py", line 225, in submit_resource_request
self._pilot.wait([rp.PMGR_ACTIVE, rp.DONE, rp.FAILED, rp.CANCELED])
File "/home/jonahmgilbert/miniconda3/envs/facts/lib/python3.11/site-packages/radical/pilot/pilot.py", line 589, in wait
time.sleep(0.1)
KeyboardInterrupt
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/mnt/c/Users/jonahmgilbert/Documents/GitHub/facts/runFACTS.py", line 193, in <module>
run_experiment(args.edir, args.debug, args.alt_id, resourcedir=args.resourcedir, makeshellscript = args.shellscript, globalopts = args.global_options)
File "/mnt/c/Users/jonahmgilbert/Documents/GitHub/facts/runFACTS.py", line 86, in run_experiment
amgr.run()
File "/home/jonahmgilbert/miniconda3/envs/facts/lib/python3.11/site-packages/radical/entk/appman/appmanager.py", line 485
, in run
raise EnTKError(ex) from ex
radical.entk.exceptions.EnTKError
When I go to check my re.session, re.session.BFI-33300.jonahmgilbert.019828.0001.zip, it's pretty much empty. I tried running the bootstrap_0.sh
and I see:
bash ~/radical.pilot.sandbox/re.session.BFI-33300.jonahmgilbert.019828.0001/pilot.0000/bootstrap_0.sh
DeprecationWarning: 'source deactivate' is deprecated. Use 'conda deactivate'.
/etc/profile.d/conda.sh: No such file or directorytivate: line 6: C:/Users/jonahmgilbert/Miniconda3
: numeric argument requiredMiniconda3/Scripts/deactivate: line 6: return: 1
# -------------------------------------------------------------------
bootstrap_0 running on host: BFI-33300.ad.uchicago.edu.
bootstrap_0 started as : '/home/jonahmgilbert/radical.pilot.sandbox/re.session.BFI-33300.jonahmgilbert.019828.0001/pilot.0000/bootstrap_0.sh '
safe environment of bootstrap_0
bootstrap_0 stderr redirected to stdout
https://files.pythonhosted.org/packages/1c/c2/7516ea983fc37cec2128e7cb0b2b516125a478f8fc633b8f5dfa849f13f7/virtualenv-16.7.12.tar.gz
# -------------------------------------------------------------------
# untar sandbox
# -------------------------------------------------------------------
tar (child): ../: Cannot read: Is a directory
tar (child): At beginning of tape, quitting now
tar (child): Error is not recoverable: exiting now
gzip: stdin: unexpected end of file
tar: Child returned status 2
tar: Error is not recoverable: exiting now
# -------------------------------------------------------------------
create gtod, prof
1713205080.531938,sync_abs,bootstrap_0,MainThread,,PMGR_ACTIVE_PENDING,BFI-33300:172.26.75.24:1713205080.531938:1713205080.531938:1713205080.531938
VIRTENV :
mkdir: cannot create directory ‘’: No such file or directory
VIRTENV normalized: /home/jonahmgilbert
missing RUNTIME
I've tried this from both a conda environment and a default python environment with the same error. Not sure if this is related or not.
Yes, indeed - our last release solves the pip install
problem, but some second order deployment issue (from the same setuptools update) keep popping up. The RP branch hotfix/deployment
tries to address these issues. The PR radical-cybertools/radical.pilot/pull/3169 is still work in progress, but I hope it will converge in the next 24 hours and we can release again.
I am really sorry for the problems - the setuptools
upgrade hit us from nowhere ... :-(
That PR should be in a working state now.
Hey @JMGilbert I have been working with @andre-merzky and @mturilli on this for the past few days as well. Sorry I was unware of the most recent update. While it's good we discovered it, the flip side of this issue as well is that whatever setup we have in FACTS is not passing the information to the pilot to recognize that it should use the ve active when launched instead of creating a new one. This is also working on being resolved.
@andre-merzky I have sent @mturilli the new docker file we use, as well as will test the new versions today.... This explains why earlier when I checked the RCt versions and they were all 1.52 hahah
Hey @andre-merzky and @mturilli Just tried with the new stack versions and am still getting the error: EnTK session: re.session.fce5772a-fc1b-11ee-81a7-0242ac110002 Creating AppManager Setting up ZMQ queues ok AppManager initialized ok Validating and assigning resource manager ok ** STEP: climate_step ** Setting up ZMQ queues n/a All components terminated Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/radical/entk/appman/appmanager.py", line 459, in run self._rmgr.submit_resource_request() File "/usr/local/lib/python3.8/dist-packages/radical/entk/execman/rp/resource_manager.py", line 225, in submit_resource_request self._pilot.wait([rp.PMGR_ACTIVE, rp.DONE, rp.FAILED, rp.CANCELED]) File "/usr/local/lib/python3.8/dist-packages/radical/pilot/pilot.py", line 589, in wait time.sleep(0.1) KeyboardInterrupt
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "runFACTS.py", line 193, in
it also appears that bootstrap_0.out does not get created either so I can't check to see if it's the same warning/error internally (factsVe) jovyan@08654797330c:~/radical.pilot.sandbox$ find re.session.fce5772a-fc1b-11ee-81a7-0242ac110002/pilot.0000/ re.session.fce5772a-fc1b-11ee-81a7-0242ac110002/pilot.0000/ re.session.fce5772a-fc1b-11ee-81a7-0242ac110002/pilot.0000/agent_0.cfg re.session.fce5772a-fc1b-11ee-81a7-0242ac110002/pilot.0000/radical-utils-env.sh re.session.fce5772a-fc1b-11ee-81a7-0242ac110002/pilot.0000/location.lst re.session.fce5772a-fc1b-11ee-81a7-0242ac110002/pilot.0000/bootstrap_0.sh
with the new stack versions
@AlexReedy : does that mean the 1.52 release or the RP branch (hotfix/deployment
) mentioned in this thread?
@andre-merzky ah sorry I thought they had been grouped together now, that was 1.52 (at least in docker it was still failing), i'll try the branch now
@andre-merzky @mturilli @JMGilbert looks like running with the rp hotfix branch is working!
Note: This is where rp just creates ve.localhost not forcing the launch ve, which still seems to not want to work for me
Thanks for checking, @AlexReedy !
Note: This is where rp just creates ve.localhost not forcing the launch ve, which still seems to not want to work for me
Can you please expand on the above? Am I interpreting correctly that for you case it works if the pilot agent is running in it's own VE which the pilot bootstrapper creates, but fails when the pilot tries to use the client side VE?
No it runs through using the pilot created ve everytime, I can't seem to get it to run the client side ve, but this is not quite related to this git issue. I will do some more testing. For this issue with setuptools, everything seems to be working fine!
This seems to be successfully completed and released. Pls reopen if you have an issue with setuptools
.
I've been running FACTS for a while on the same environment and I recently cleared my radical pilot sandbox including the virtual environment. When I ran FACTS again, it crashed and the radical session
bootstrap_0.out
contained an error that it had failed to install any of the radical packages.I managed to track this down to setuptools by creating a fresh virtual environment, then executing
pip install setuptools==69.0.2
andpip install radical.entk
. This worked. Then I created another fresh virtual environment, executedpip install setuptools --upgrade
andpip install radical.entk
which returned an error.I think because radical installs the most recent version of setuptools in the virtual environment, it then fails to install the radical packages and the session crashes.