saga-project / BigJob

SAGA-based Pilot-Job Implementation for Compute and Data
http://saga-project.github.com/BigJob/
Other
19 stars 8 forks source link

bootstrap problems on india... #129

Open andre-merzky opened 11 years ago

andre-merzky commented 11 years ago

On india, with python-2.7 module loaded, and using develop-prod, I see:

(python)[merzky@i136 agent]$ cat stderr-bj-918eef0a-d1b7-11e2-bab9-00231582da34-agent.txt 
git version 1.7.8.3 loaded
Python version 2.7 loaded
Python version 2.7 loaded
Traceback (most recent call last):
  File "/N/u/merzky/.bigjob/bigjob-bootstrap.py", line 1967, in <module>
    main()
  File "/N/u/merzky/.bigjob/bigjob-bootstrap.py", line 817, in main
    never_download=options.never_download)
  File "/N/u/merzky/.bigjob/bigjob-bootstrap.py", line 908, in create_environment
    site_packages=site_packages, clear=clear))
  File "/N/u/merzky/.bigjob/python/lib/python2.7/posixpath.py", line 339, in abspath
    if not isabs(path):
  File "/N/u/merzky/.bigjob/python/lib/python2.7/posixpath.py", line 52, in isabs
    return s.startswith('/')
AttributeError: 'NoneType' object has no attribute 'startswith'
Traceback (most recent call last):
  File "<string>", line 44, in <module>
ImportError: No module named bigjob.bigjob_agent

The trouble-making line in question seems to get a None return value from the install_python method -- but that method is somewhat, aehm, complex, so I am not sure how to debug it in the remote reployment setting. Any advise?

melrom commented 11 years ago

Hello, this bug is not replicate-able.

what does your .bashrc look like? Do you have 'module load torque'? It also looks like you are doing a module load python and overwriting whatever is in your virtualenv.

andre-merzky commented 11 years ago

Hi Melissa,

my .bashrc contains:

module load git
module load vim
module load python

source ~/.bigjob/python/bin/activate

and nothing (of importance) thereafter. I have no torque module, but on login I see:

torque/2.5.5 version 2.5.5 loaded
moab version 5.4.0 loaded

and also I can check with:

(python)[merzky@i136 ~]$ module list 
Currently Loaded Modulefiles:
  1) torque/2.5.5   2) moab/5.4.0     3) git/1.7.8.3    4) vim/7.2        5) python/2.7

That looks ok I assume?

Do you happen to have any further ideas on how to debug this? But also: no matter what the user setup is, it should not result in such a low-level exception IMHO...

Many thanks! :-)

Andre.

ashleyz commented 11 years ago

Hi Andre!

I took a look...

I am not using the Python module, but used python2.6 directly e.g.

azebro1@i136:~$ python
Python 2.4.3 (#1, Oct 23 2012, 22:02:41) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-54)] on linux2
azebro1@i136:~$ python2.6 --version
Python 2.6.8
azebro1@i136:~$ source ~/saga-python-env/bin/activate
(saga-python-env)azebro1@i136:~$ python --version
Python 2.6.8

So, maybe make a virtualenv with python2.6 and give that a spin?

Here are my modules:

torque/2.5.5 version 2.5.5 loaded
moab version 5.4.0 loaded
git version 1.7.8.3 loaded

What -may- be happening is that the module add python is being invoked in your .bashrc somehow during the bootstrap, which executes the module add python command and clobbers your virtualenv.

Could you try removing the python line from your bashrc and using python2.6 to create the virtualenv? That would (hopefully) clear things up...

andre-merzky commented 11 years ago

Thanks for the feedback, Ashley! I tried that, but alas it only changes the error to:

(python)[merzky@i136 agent]$ cat stderr-bj-b7ae3afa-d20d-11e2-ab76-00231582da34-agent.txt
git version 1.7.8.3 loaded
python: error while loading shared libraries: libpython2.6.so.1.0: cannot open shared object file: No such file or directory

so it seems that the 2.6 installation does not include the devel part, which hick-ups on the paramiko dependency of BigJob (I guess)... Am I missing any settings for the devel libs?

Thanks, Andre.

melrom commented 11 years ago

@andre-merzky - is this still a ticket for BigJob? It seems like a ticket for yourself and your environment...

andre-merzky commented 11 years ago

Let me check if the problem persists...

oleweidner commented 11 years ago

Andre -- what's the status?

andre-merzky commented 10 years ago

I didn't look at BJ/india for a while -- but will have to do so sometime soon anyways, so will report back then.

andre-merzky commented 10 years ago

This seems to be the same problem as for sagapilot: when I source a virtualenv in my ~/.bashrc, things go terribly wrong with the agents. IMHO, the agent bootstrap script should call deactivate as the very fist action...