radical-cybertools / radical.pilot

RADICAL-Pilot
http://radical-cybertools.github.io/radical-pilot/index.html
Other
54 stars 23 forks source link

Unable to import mkl on Gordon, v0.45.RC2 #1238

Closed mingtaiha closed 7 years ago

mingtaiha commented 7 years ago

On Gordon

intel/2013.1.117(49):ERROR:105: Unable to locate a modulefile for 'mkl'
Python 2.7.10
ibethune commented 7 years ago

@mingtaiha can you say a bit more about what you were trying to do when you get this error? The ticket is not referenced in the testing spreadsheet...

mingtaiha commented 7 years ago

Sure, sorry for the short description. I was running 09_mpi_units.py. I posted this error in its own ticket as opposed to adding it to Ticket #1239 as the mkl error only occurred on Gordon and not on the other machines (Comet, Stampede):

My radical-stack:

(ve)[mingtha@workflow rp]$ radical-stack 
python            : 2.7.5
virtualenv        : /home/mingtha/ve
radical.utils     : v0.45.RC2@detached-v0.45.RC2
saga-python       : v0.45.RC2@detached-v0.45.RC2
radical.pilot     : v0.45.RC2@detached-v0.45.RC2
andre-merzky commented 7 years ago

I tried in a number of different ways, but I did not manage to reproduce this problem. Could you please check if you not loading mkl in your bashrc, or any other modules? Otherwise I am somewhat at a loss - can you please attach a tarball of the pilot sandbox?

andre-merzky commented 7 years ago

One thing I just realized: I am on a slightly different Python version:

# before any module load:
$ module list
Currently Loaded Modulefiles:
  1) intel/2013.1.117   2) mvapich2_ib/1.9    3) gnubase/1.0

$ python -V
Python 2.6.6

$ module load python
python -V
Python 2.7.10

How did you get to your version?

ibethune commented 7 years ago

@mingtaiha - can you check if you can reproduce, and if the mkl module is somehow loading in your user-specific setup. If it persists, a copy of the pilot sandbox is needed to debug.

mingtaiha commented 7 years ago

I tried it again and I get the same error. The error can be found in bootstrap_1.err. Here's the entire sandbox. I am also unable to import mpi4py

rp.session.workflow.iu.xsede.org.mingtha.017224.0000-pilot.0000.zip

andre-merzky commented 7 years ago

Can you please send me the following files from gordon?

~/.login
~/.profile
~/.bashrc

Thanks!

andre-merzky commented 7 years ago

Hmm, also, from the sandbox it seems that the error of not finding the mkl module is non-fatal -- installation otherwise succeeds, and units are being executed, right? Those fail due to missing mpi4py, but that is a different issue altogether, right? Do the other (non-mpi-) examples succeed?

[amerzky@gcn-19-56 ~]$ module list
Currently Loaded Modulefiles:
  1) intel/2013.1.117   2) mvapich2_ib/1.9    3) gnubase/1.0

[amerzky@gcn-19-56 ~]$ module load python
intel/2013.1.117(49):ERROR:105: Unable to locate a modulefile for 'mkl'

[amerzky@gcn-19-56 ~]$ module list
Currently Loaded Modulefiles:
  1) intel/2013.1.117   2) mvapich2_ib/1.9    3) gnubase/1.0        4) python/2.7.10
andre-merzky commented 7 years ago

I opened an xsede ticket on the error message - at this point I don't see this stopping v0.45. Tests on Gordon work ok otherwise.

andre-merzky commented 7 years ago

XSEDE confirmed this to be a module dependency issue which will be fixed on their end. Since its non-fatal anyway, this ticket is being closed.

https://portal.xsede.org/group/xup/tickets/-/tickets/63551