mala-project / mala

Materials Learning Algorithms. A framework for machine learning materials properties from first-principles data.
https://mala-project.github.io/mala/
BSD 3-Clause "New" or "Revised" License
81 stars 25 forks source link

Problem building QE (total energy module) #559

Open elect86 opened 1 month ago

elect86 commented 1 month ago

So, I've been trying to tackle this for a while now and I got so close, but I was stopped by one damn last issue, nonetheless

I'm on Ubuntu

I've been trying to follow the docs here:

I had these issues, which I managed to fix by getting OpenMPI using these packages here

sudo apt-get install openmpi-bin openmpi-doc libopenmpi-dev

I need to install, in addition, gfortran otherwise ./configure won't see any compiler

One difference is that gfortran will install the last one, that is gfortran-13, while @RandomDefaultUser has a working scenario with 11, I tried to install that, but ./configure doesn't detect it as well

Also, gfortran will actually install gfortran-13 gfortran-13-x86-64-linux-gnu libgfortran-13-dev

So, in order to match Lenz specs, I tried the same for 11, but

127 elect@5800x:~$ sudo apt install gfortran-11 gfortran-11-x86-64-linux-gnu libgfortran-11-dev Reading package lists... Done Building dependency tree... Done Reading state information... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation:

The following packages have unmet dependencies: gfortran-11-x86-64-linux-gnu:i386 : Depends: gcc-11-x86-64-linux-gnu:i386 (= 11.4.0-9ubuntu1cross1) but it is not going to be installed Depends: libgfortran-11-dev-amd64-cross:i386 (>= 11) but it is not installable E: Unable to correct problems, you have held broken packages.

Interesting, the very same error with 12

So, I gave up on that front and went forward with 13, which is fine under this point of view

Then, trying to compile, these problems

I managed to fix them by adding these flags into make.inc

#  Generic flag name:
FFLAGS = -fallow-argument-mismatch
# Fortran flag options name in wannier90
FCOPTS = -fallow-argument-mismatch

as

# compiler flags with and without optimization for fortran-77
# the latter is NEEDED to properly compile dlamch.f, used by lapack
FFLAGS         = -fPIC -fallow-argument-mismatch
FFLAGS_NOOPT   = -O0 -g
FCOPTS = -fallow-argument-mismatch

Then this, which I managed to fix by installing meson

Then this, which I noticed that running the very same command in terminal after having copy/pasted force_mod.mod into the upper directory was returning the next .mod missing, I solved by adding a -I$root_dir/PW/src to the project_inc_folders in build_total_energy_module.sh

Then I had to add the directory to my Python path, I did as they said here

Then I hope this to be over, but when I run ex05_run_predictions.py I got

ImportError: /home/elect/PycharmProjects/mala/external_modules/total_energy_module/total_energy.cpython-312-x86_64-linux-gnu.so: undefined symbol: __ener_MOD_vtxc

And here I'm stuck so far..

RandomDefaultUser commented 1 month ago

TL;DR

I think f2py has changed from the version other users were using. Compiling with your compiler is fine, compiling with the compile flags is fine, the problems only seem to be with f2py with suddenly requires paths and packages it did not used to. I think we need to dive deeper into this.

Full answer

Sorry to hear of that experience @elect86. The total energy module can be a bit tricky to build, most notably because we have to manually update it when something in QE changes.

Here's my guess on what is/was happening.

  1. Your system wants you to use gfortran13 because that's the recent version included in the gcc package. Downgrading this may not easily be possible, since some other libraries and stuff you have installed may link to this.
  2. This means you have to build with gfortran13. That is fine in and of itself, EXCEPT for the fact that part of the QE 7.2 code may not be natively compatible. For a regular QE user this is not a problem - they would simply use the latest version of QE to build, which fixes this. Since we haven't updated the TEM in a while, you run into this issue.
  3. -fallow-argument-mismatch usually fixes these problems. It does in your case. I had a similar problem about a year and a half ago, where I then finally updated our TEM because we were out of touch and getting this compilation errors all too regularly.
  4. But the flag fixes it in your case, which is great, and you can compile QE. Great!
  5. Now you wanted to build the TEM itself, which fails because it needs meson.
  6. This is where stuff gets weird - meson was NOT a requirement for f2py for me. In fact, it is not even on my machine. My installation (OS, python, compilers) is about one and a half years old, and not thoroughly updated (yes, shame on me, something something dissertation something something running system).

My best guess is that your f2py version is significantly more recent than mine and as a result, something in the installation script isn't working anymore. Point in case is that you said you had to add a path manually - this should NOT be the case. We provide all the paths. But if something in the f2py workflow changed, this may be the reason.

To confirm - which version does f2py -v give you? It is 1.23.5 for me. Yours is most likely way more recent, as is your python version.

Now for a solution: First I would advise checking out whether the f2py version is indeed different. If so, I would bet this is the reason. The only meaningful solution, in my opinion would then be to investigate what has changed and update the TEM and install script. It will take me at least until next week to find time to do this. If you need this urgently, let's coordinate via message if I can point you to some things that you can do in this regard.

elcorto commented 1 month ago

Thanks for the elaborate investigation. Regarding f2py and meson, I ran into the same issue with one of my packages. Starting with Python 3.12, numpy enforces meson over distutils.

RandomDefaultUser commented 1 month ago

Good to know that it is not only a MALA specific thing and indeed just an update in the entire toolchain. I guess we will fix this issue by upgrading MALA and the scripts it comes with to a more recent python version, and fixing/documenting that version, which ties to #526 . Will try to take care of this soon!

elcorto commented 1 month ago

Agreed. As it stands, we support up to Python 3.11.

elcorto commented 1 month ago

Thanks for the elaborate investigation. Regarding f2py and meson, I ran into the same issue with one of my packages. Starting with Python 3.12, numpy enforces meson over distutils.

I fixed this in the package above, which may serve as a reference, see these commits: build and CI. This works for Python >= 3.9 .