Closed haoyuanchen closed 10 years ago
Thanks for trying out Haoyuan!
I am assuming you are using Python 2.6.x? timedelta.total_seconds() is a new feature in Python 2.7, see: http://docs.python.org/2/library/datetime.html#datetime.timedelta.total_seconds
It is possible to create a workaround for this, but there is a number of other features in RepEx and RP which will require Python 2.7 anyway. Since Python 2.7 was released more than four years ago, I assume it is a reasonable expectation from users to have it installed.
To answer your question about PBS job, even if your application has crashed, Pilot will continue to run on the resource and wait for CUs to execute for the time period resources are allocated for, since error was not "on the pilot end". So it is a good idea to check your running jobs and cancel then manually if this happens.
Thanks, Antons
Thanks Antons! I tried to do a "module load python/2.7.5" and then run the same thing, it crashed immediately and gave this error:
Error: Couldn't create new session: None
Traceback (most recent call last):
File "launch_simulation_scheme_3_amber.py", line 63, in
I tried to use RADICAL_PILOT_VERBOSE=info and I saw that my radical pilot seems to be installed under python2.6/site-packages. Do I need to reinstall radical pilot and saga-python with the python2.7 module loaded?
Thanks a lot! Haoyuan
Haoyuan, there is no need to do module load python2.7.x on a cluster, this is already handled by RP. What you need to do is to install python 2.7 on your laptop (machine you are running RepEx code from) and then reinstall everything. If you will type: python -V in your virtual environment right now it will say Python 2.6.x, right? but this should be Python 2.7.x. If there are any problems with getting this to work, please let me know. For example when you have multiple Python versions on your machine you need to use -p flag to point to specific Python version to be used with your virtual environment:
$ virtualenv -p /usr/bin/python2.7 $HOME/exenv and then you can call: $ source $HOME/exenv/bin/activate
Thanks, Antons
Hi Antons,
I installed python2.7 on trestles (I'm running repex from there) and reinstalled everything (RP, saga-python and RepEx) following your instructions. However it gave me the same error. "python -V" says 2.7 and "radicalpilot-version" says 0.18.RC2.
I'm not sure why this happens but I'm guessing if it's because I installed py2.7 locally, using the command "make altinstall prefix=~ exec-prefix=~". I did this since I don't (can't) have root/sudo permission there.
Thanks! Haoyuan
Hi,
I'm trying to run the code on Trestles (I've followed the installing instruction step by step) using the asynchronous scheme 3. After one cycle it exits with this error:
Traceback (most recent call last): File "launch_simulation_scheme_3_amber.py", line 63, in
pilot_kernel.run_simulation( replicas, pilot_object, session, md_kernel )
File "/home/chen1990/RepEx/src/radical/repex/pilot_kernels/pilot_kernel_scheme_3.py", line 141, in run_simulation
runtime = (check_point - sim_start).total_seconds()
AttributeError: 'datetime.timedelta' object has no attribute 'total_seconds'
Exception in thread Thread-1 (most likely raised during interpreter shutdown)
However, when I do qstat I can still see the job with the status "R". This is the output when I do "qstat -u chen1990" several minutes after the crashing:
trestles-fe1.local: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
2211587.trestles-fe1.l chen1990 normal SAGA-Python-PBSJ 35523 1 32 -- 00:20:00 R 00:12:53
Just wondering what causes the error and why the PBS job still keeps running.
Thanks! Haoyuan