marvel-nccr / ansible-role-bigdft

Other
0 stars 1 forks source link

BigDFT fails to build on Quantum Mobile #8

Closed borellim closed 3 years ago

borellim commented 4 years ago

I'm using v1.2.0 of this role, but the same happens on v1.0.0. Unfortunately I'm not very familiar with the code, so it's hard for me to debug. I see that CI is passing both here and on the Quantum Mobile Cloud Edition, so it must be something different in the environment.

Here is the ansible output for the task: bigdft-build-fail-v1.2.0.log And here are what I believe to be the most relevant lines:

    W: eigen3 has a dependency on unknown "None" module
    W: openbabel has a dependency on unknown "None" module

and:

    Traceback (most recent call last):
      File "/tmp/bigdft-suite-1.9.0/bundler/jhbuild.py", line 39, in <module>
        jhbuild.main.main(sys.argv[1:])
      File "/tmp/bigdft-suite-1.9.0/bundler/jhbuild/main.py", line 120, in main
        rc = jhbuild.commands.run(command, config, args, help=lambda: print_help(parser))
      File "/tmp/bigdft-suite-1.9.0/bundler/jhbuild/commands/__init__.py", line 188, in run
        return cmd.execute(config, args, help)
      File "/tmp/bigdft-suite-1.9.0/bundler/jhbuild/commands/__init__.py", line 56, in execute
        return self.run(config, options, args, help)
      File "/tmp/bigdft-suite-1.9.0/bundler/jhbuild/commands/base.py", line 262, in run
        return build.build()
      File "/tmp/bigdft-suite-1.9.0/bundler/jhbuild/frontends/buildscript.py", line 172, in build
        error, altphases = module.run_phase(self, phase)
      File "/tmp/bigdft-suite-1.9.0/bundler/jhbuild/modtypes/__init__.py", line 432, in run_phase
        if domethod: method(buildscript)
      File "/tmp/bigdft-suite-1.9.0/bundler/jhbuild/modtypes/__init__.py", line 624, in do_checkout
        self.checkout(buildscript)
      File "/tmp/bigdft-suite-1.9.0/bundler/jhbuild/modtypes/__init__.py", line 635, in checkout
        if self.check_build_policy(buildscript) == self.PHASE_DONE:
      File "/tmp/bigdft-suite-1.9.0/bundler/jhbuild/modtypes/__init__.py", line 465, in check_build_policy
        install_date_dep = buildscript.moduleset.packagedb.installdate(dep)
      File "/tmp/bigdft-suite-1.9.0/bundler/jhbuild/utils/packagedb.py", line 190, in installdate
        entry = self.get(package)
      File "/tmp/bigdft-suite-1.9.0/bundler/jhbuild/utils/packagedb.py", line 162, in get
        return PackageEntry.open(self.dirname, package)
      File "/tmp/bigdft-suite-1.9.0/bundler/jhbuild/utils/packagedb.py", line 120, in open
        with open(os.path.join (dirname, 'info', package), "rb") as info:
      File "/usr/lib/python2.7/posixpath.py", line 68, in join
        if b.startswith('/'):
borellim commented 4 years ago

I now encounter the same error on the Cloud edition. Perhaps a dependency has been recently updated?

adegomme commented 4 years ago

I was able to reproduce yesterday. I think this is already fixed on bigdft's trunk. So I will try with the gitlab version and switch to this one if it's better, until a new release is done.

ltalirz commented 4 years ago

Ok, I'm restarting travis as well... would be good if this surfaces here as well https://travis-ci.org/github/marvel-nccr/ansible-role-bigdft

Edit: Hm, on travis, the tests continue to pass... this would seem to imply they are overlooking something (perhaps some dependencies need to be specified more strictly?)

adegomme commented 4 years ago

I switched to git, and tried to force python3 for v1.3.0 of this role. I had to change the test, as the main one used a python2 script to compare outputs. I hope this builds correctly now.

borellim commented 4 years ago

Thanks a lot for your effort! I'm sorry to report that it now hangs at build stage (not an error, just does not do anything). The only related process I found was this, in sleep state (S+): /usr/bin/python3 /tmp/bigdft-suite/bundler/jhbuild.py -f buildrc --conditions=+babel build spred It runs the C++ and Fortran compilers for a while, then stops. Sorry, I don't have the logs. Note that this was NOT done on a clean machine; unfortunately I don't have the time to try it on a clean machine at the moment.

adegomme commented 4 years ago

Bummer. The problem is indeed likely due to the unclean machine, the old build from is still present and it finds installed files. An idea is to log on it, go to /tmp/bigdft-suite-1.9.0/build, and try " ../Installer.py clean -f ../rcfiles/ubuntu_MPI.rc -y " ... and then remove all of /tmp/bigdft-suite-1.9.0/ .

The blocking is a mistake, Installer.py must detect some issue, and goes into a prompt to let the user determine the next step. This should have been disabled by setting in the env DEBIAN_FRONTEND=noninteractive which I forgot. I just pushed a fix. I will tag v1.3.1 with this.

edit: 1.3.1 not 1.4.1

chrisjsewell commented 4 years ago

Are we happy this is now fixed then?

adegomme commented 4 years ago

any news ?

ltalirz commented 4 years ago

@borellim is currently on vacation I believe. I'm sure he'll get back to you once he's back

borellim commented 4 years ago

Thanks, and sorry for the wait! It seems to work indeed in the Quantum Mobile Desktop edition. I'll see if we can test it soon in the Cloud edition as well.

chrisjsewell commented 3 years ago

Right now that I've completely re-wrote the quantum-mobile repo, we can get back to this lol: the full desktop build is currently running at: https://github.com/marvel-nccr/quantum-mobile/runs/1260023102?check_suite_focus=true

so we shall see what the outcome is...

chrisjsewell commented 3 years ago

Copying from https://github.com/marvel-nccr/quantum-mobile/pull/141#issuecomment-709811928:

Yep there is something broken here: https://github.com/marvel-nccr/quantum-mobile/runs/1260562158

This stalls at TASK [marvel-nccr.bigdft : Make bigdft executables], and I also tried the build locally and got the same result 🤷

adegomme commented 3 years ago

Difficult to tell without any output or logs, but it must fail at some point, which starts a prompt asking the user for the next step, hence the stalling. I'll have a look today

chrisjsewell commented 3 years ago

Thanks

adegomme commented 3 years ago

Tough luck on the timing, it looks like eigen3 moved from bitbucket to gitlab (this summer) but just removed its old tarballs overnight. So it was failing at getting them as they are built as a dependency. I just checked that with the correct path the build finishes, and I'll add a temporary workaround if the fix ( https://gitlab.com/l_sim/bigdft-suite/-/merge_requests/97 ) is not merged into bigdft today. I'm a bit surprised that it stalls when there is an error, as we explicitely ask for non interactive mode, I'll see if I can fix that as well.

adegomme commented 3 years ago

ok, the merge was done, so it should build, now.

chrisjsewell commented 3 years ago

Great cheers, I've re-started the build 🤞 https://github.com/marvel-nccr/quantum-mobile/runs/1263879820

ltalirz commented 3 years ago

@adegomme Perhaps I'm missing some previous conversations here, but since I don't see a recent commit to the role I'd just like to understand: are merges to the bigdft master branch automatically picked up by the bigdft ansible role?

The role defaults would seem to suggest that a specific version of bigdft is being installed, which is an important good practice (otherwise builds of the role become irreproducible): https://github.com/marvel-nccr/ansible-role-bigdft/blob/master/defaults/main.yml

If you want to default to a particular git commit that isn't released yet that's ok as well, but please make sure to fix it in the role.

adegomme commented 3 years ago

Yes, the idea is to stick to a release, but we had to use the devel branch for now (actually to cope with this issue, with commit 8bf6040ae ). We are still waiting for some huge merges to come to be able to release, Luigi is on it. In the meantime, I will indeed put the actual commit in the role, to avoid reproducibility issues.

adegomme commented 3 years ago

Great cheers, I've re-started the build 🤞 https://github.com/marvel-nccr/quantum-mobile/runs/1263879820

the build still got stuck, but worked locally for me ... I'm trying again to see what could be wrong here ...

the build step takes ~15 minutes on my laptop..

chrisjsewell commented 3 years ago

Yeh I've started the build again just in case

What operating system is your laptop?

adegomme commented 3 years ago

Ubuntu focal, to reproduce I do "ansible-galaxy install -r requirements --force; vagrant box update; vagrant up" with previous image removed .. I just published a new v1.3.2 version of this repo, to see if the stalling issue gets better (it should crash immediately when failing, instead of going to a prompt), and to avoid tracking devel branch.

chrisjsewell commented 3 years ago

ok cheers, I'll try adding that (maybe tomorrow now). Note the build is run on OSX on GH actions (the software in not available on Linux), but in principle this should not matter

chrisjsewell commented 3 years ago

Yep all fixed now thanks: https://github.com/marvel-nccr/quantum-mobile/runs/1285257391