kylebystrom / pawpyseed

Parallel C/Python package for numerical analysis of PAW DFT wavefunctions
BSD 3-Clause "New" or "Revised" License
31 stars 11 forks source link

pip install fails because it can't find mkl.h #17

Open bernstei opened 3 years ago

bernstei commented 3 years ago

I just tried updating to the latest pip version (0.6.4), and it fails with

    gcc -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -DMKL_Complex16=double complex -DMKL_Complex8=float complex -Ipawpyseed/core -I/opt/rh/system-python36/usr/lib64/python3.6/site-packages/numpy/core/include -I~/Library/Python/3.6/include -I~/.local/lib/include -I/usr/lib/include -Ipawpyseed/core/tests -I/usr/include/python3.6m -c pawpyseed/core/pawpyc.c -o build/temp.linux-x86_64-3.6/pawpyseed/core/pawpyc.o -std=c11 -fPIC -Wall -fopenmp -g
    In file included from pawpyseed/core/projector.h:9:0,
                     from pawpyseed/core/pawpyc.c:694:
    pawpyseed/core/linalg.h:7:10: fatal error: mkl.h: No such file or directory
     #include <mkl.h>
              ^~~~~~~
    compilation terminated.
    error: command 'gcc' failed with exit status 1

I made mkl available to python with pip install --user mkl-devel, as suggested in the README, but that puts mkl.h in ~/.local/include, while the search path listed above is -I~/.local/lib/include

This is CentOS Linux with system python3 and non-system gcc/gfortran. Any guesses as to where the path comes from, and what might have gone wrong?

bernstei commented 3 years ago

Sorry, it looks like this may never have worked, and I'm not entirely sure how I installed it before. Don't worry about it until I figure it out, and if it's my fault, I'll just close the issue.

kylebystrom commented 3 years ago

Just a guess, is it possible that your MKLROOT is ~/.local/lib instead of ~/.local? It should be the latter, because the compiler looks in $MKLROOT/lib for libraries and $MKLROOT/include for header files.

bernstei commented 3 years ago

OK - my original issue was caused by having a bad .pawpyseed.site.rc file. However, it still fails (both pip and manual setup.py) because it can't find -lmkl_def. It appears to be unavailable in the latest pip version of mkl-devel, and unnecessary - when I remove it from setup.py, it seems to compile and install fine.

I edited the title correspondingly.

bernstei commented 3 years ago

No, I spoke too early - I am getting missing mkl symbols at runtime.

INTEL MKL ERROR: /home/cluster2_new/bernstei/.local/lib/libmkl_avx512.so.1: undefined symbol: mkl_sparse_optimize_bsr_trsm_i8.
Intel MKL FATAL ERROR: Cannot load libmkl_avx512.so.1 or libmkl_def.so.1.
kylebystrom commented 3 years ago

That's odd. I ran the unit tests before pushing to PYPI but did not upgrade mkl-devel, so I think you might be right about it being an issue with the mkl version.

bernstei commented 3 years ago

FWIW, libmkl_def.so.1 exists in ~/.local/lib, but not just plain .so.

kylebystrom commented 3 years ago

Just checked, and libmkl_def.so is in my MKL installation, which explains the discrepancy. If libmkl_def.so.1 is in ~/.loca/lib, I wonder why it says it cannot find libmkl_def.so.1, as opposed to looking for and failing to find libmkl_def.so

kylebystrom commented 3 years ago

It might be worth trying setting sdl=True in site.cfg, which I believe is more portable.

bernstei commented 3 years ago

Hmm - that does compile, but I get the error

INTEL MKL ERROR: /home/cluster2_new/bernstei/.local/lib/libmkl_avx512.so.1: undefined symbol: mkl_sparse_optimize_bsr_trsm_i8.
Intel MKL FATAL ERROR: Cannot load libmkl_avx512.so.1 or libmkl_def.so.1.
bernstei commented 3 years ago

That routine (mkl_sparse_optimize_bsr_trsm_i8) seem to be be defined in mkl_sequential and various mkl*thread, but I guess somehow those aren't getting linked in?

kylebystrom commented 3 years ago

Yeah, that seems to be the case. I'm working on updating conda to see if I can reproduce the issue. I also saw this Stack Overflow post about a similar issue, with a simple solution (seemingly specific to Anaconda though): https://stackoverflow.com/questions/36659453/intel-mkl-fatal-error-cannot-load-libmkl-avx2-so-or-libmkl-def-so

By the way, are you using standard Python or Anaconda? Just for the sake of trying to reproduce your issue.

bernstei commented 3 years ago

Standard python (3.6). Thanks for checking it out.

bernstei commented 3 years ago

Oddly, without sdl = True (and removing -lmkl_def), it gives the same error, even though that symbol seems to be defined in one of the linked libraries:

> ldd ~/.local/lib/python3.6/site-packages/pawpyseed-0.6.4-py3.6-linux-x86_64.egg/pawpyseed/core/pawpyc.cpython-36m-x86_64-linux-gnu.so 
        linux-vdso.so.1 =>  (0x00007fff87bee000)
        libpython3.6m.so.1.0 => /lib64/libpython3.6m.so.1.0 (0x00002ac14111f000)
        libmkl_intel_lp64.so.1 => /home/cluster2_new/bernstei/.local/lib/libmkl_intel_lp64.so.1 (0x00002ac141647000)
        libmkl_sequential.so.1 => /home/cluster2_new/bernstei/.local/lib/libmkl_sequential.so.1 (0x00002ac142382000)
        libmkl_core.so.1 => /home/cluster2_new/bernstei/.local/lib/libmkl_core.so.1 (0x00002ac143f8d000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00002ac14bf53000)
        libm.so.6 => /lib64/libm.so.6 (0x00002ac14c16f000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00002ac14c471000)
        libgomp.so.1 => /lib64/libgomp.so.1 (0x00002ac14c675000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002ac14c89b000)
        libc.so.6 => /lib64/libc.so.6 (0x00002ac14cab1000)
        libutil.so.1 => /lib64/libutil.so.1 (0x00002ac14ce7f000)
        /lib64/ld-linux-x86-64.so.2 (0x00002ac140c24000)

> nm /home/cluster2_new/bernstei/.local/lib/libmkl_sequential.so.1 | grep mkl_sparse_optimize_bsr_trsm_i8
00000000006ab110 T mkl_sparse_optimize_bsr_trsm_i8
bernstei commented 3 years ago

By the way, I made a mistake: sdl=True does run. I forgot to install after building. Only sdl=False (and commenting out -lmkl_def) fails for me now.

Anyway, I have a viable workaround. Keep working on this only if you want to support the other way.

kylebystrom commented 3 years ago

Oh nice, glad that works at least. I guess that's why numpy always just uses sdl linking. Thanks for raising the issue and running these tests.

I will keep working on it, though I might end up just deciding to change the default to sdl in the next release.

sheriftawfikabbas commented 2 years ago

That's how I fixed the error, in case the above fix doesn't work for someone else:

The "*.so.2" in the library file name is missed by ld, so:

cp $HOME/.local/lib/libmkl_rt.so.2 $HOME/.local/lib/libmkl_rt.so

And then run setup.py.