pyiron / pyiron_atomistics

pyiron_atomistics - an integrated development environment (IDE) for atomistic simulation in computational materials science.
https://pyiron-atomistics.readthedocs.io
BSD 3-Clause "New" or "Revised" License
44 stars 15 forks source link

libmpi not found #681

Closed samwaseda closed 2 years ago

samwaseda commented 2 years ago

Sorry it's an MPIE problem, but LAMMPS doesn't seem to be able to find the shared library for the interactive job:

libmpi.so.40: cannot open shared object file: No such file or directory
niklassiemer commented 2 years ago

That one should be included in conda, right? I made an update (using the usual update script of @tnecnivkcots ) today. Hopefully, there is not an issue with permissions...

tnecnivkcots commented 2 years ago

I run the conda update another time now. Does this issue persist?

samwaseda commented 2 years ago

I run the conda update another time now. Does this issue persist?

Yes, but actually in the collaborative binder setup there's no problem. Does it mean there's a problem with my cluster setup, or is it because it's not built new on binder every time? Sorry I'm probably asking a novice question...

tnecnivkcots commented 2 years ago

Which binder setup are you talking about? Before I expected you are talking about the jupyterhub on cmti. Comparing that with the collaborative binder, which is started from a repository is not so easy, because of the separated environment on the kubernetes cluster at MPCDF, where the binder lives on.

Additionally there is no need to be sorry. We just have different backgrounds and that is fine.

But on which machine does this issue occure? Just to have a shared context.

samwaseda commented 2 years ago

Oh sorry only now I realized that I didn't even say where the problem occurred XD I post the entire error:

OSError                                   Traceback (most recent call last)
/tmp/ipykernel_4559/3425482066.py in <cell line: 24>()
     22 wf = ElasticConstants(pr, 'elastic_tensor')
     23 
---> 24 plt.imshow(wf.elastic_constants)

/tmp/ipykernel_4559/3150592139.py in __getattr__(self, attr)
     16             return getattr(self, '_' + attr)
     17         except AttributeError:
---> 18             args = [getattr(self, a) for a in inspect.getfullargspec(self.__getattribute__('get_' + attr)).args[1:]]
     19             setattr(self, '_' + attr, self.__getattribute__('get_' + attr)(*args))
     20             return getattr(self, '_' + attr)

/tmp/ipykernel_4559/3150592139.py in <listcomp>(.0)
     16             return getattr(self, '_' + attr)
     17         except AttributeError:
---> 18             args = [getattr(self, a) for a in inspect.getfullargspec(self.__getattribute__('get_' + attr)).args[1:]]
     19             setattr(self, '_' + attr, self.__getattribute__('get_' + attr)(*args))
     20             return getattr(self, '_' + attr)

/tmp/ipykernel_4559/3150592139.py in __getattr__(self, attr)
     17         except AttributeError:
     18             args = [getattr(self, a) for a in inspect.getfullargspec(self.__getattribute__('get_' + attr)).args[1:]]
---> 19             setattr(self, '_' + attr, self.__getattribute__('get_' + attr)(*args))
     20             return getattr(self, '_' + attr)

/tmp/ipykernel_4559/3425482066.py in get_lattice_constant(self, element)
      7         job.interactive_open()
      8         murn = job.create_job('Murnaghan', job.job_name.replace('lmp_', ''))
----> 9         murn.run()
     10         return murn['output/equilibrium_volume']**(1 / 3)
     11 

~/dev_sam/pyiron_base/pyiron_base/generic/util.py in decorated(*args, **kwargs)
    211                         stacklevel=2,
    212                     )
--> 213             return function(*args, **kwargs)
    214 
    215         return decorated

~/dev_sam/pyiron_base/pyiron_base/job/generic.py in run(self, delete_existing_job, repair, debug, run_mode, run_again)
    726                 self._run_if_repair()
    727             elif status == "initialized":
--> 728                 self._run_if_new(debug=debug)
    729             elif status == "created":
    730                 self._run_if_created()

~/dev_sam/pyiron_base/pyiron_base/job/generic.py in _run_if_new(self, debug)
   1253             debug (bool): Debug Mode
   1254         """
-> 1255         run_job_with_status_initialized(job=self, debug=debug)
   1256 
   1257     def _run_if_created(self):

~/dev_sam/pyiron_base/pyiron_base/job/runfunction.py in run_job_with_status_initialized(job, debug)
     74     else:
     75         job.save()
---> 76         job.run()
     77 
     78 

~/dev_sam/pyiron_base/pyiron_base/generic/util.py in decorated(*args, **kwargs)
    211                         stacklevel=2,
    212                     )
--> 213             return function(*args, **kwargs)
    214 
    215         return decorated

~/dev_sam/pyiron_base/pyiron_base/job/generic.py in run(self, delete_existing_job, repair, debug, run_mode, run_again)
    728                 self._run_if_new(debug=debug)
    729             elif status == "created":
--> 730                 self._run_if_created()
    731             elif status == "submitted":
    732                 run_job_with_status_submitted(job=self)

~/dev_sam/pyiron_base/pyiron_base/job/generic.py in _run_if_created(self)
   1264             int: Queue ID - if the job was send to the queue
   1265         """
-> 1266         return run_job_with_status_created(job=self)
   1267 
   1268     def _run_if_repair(self):

~/dev_sam/pyiron_base/pyiron_base/job/runfunction.py in run_job_with_status_created(job)
    107         job.run_if_scheduler()
    108     elif job.server.run_mode.interactive:
--> 109         job.run_if_interactive()
    110     elif job.server.run_mode.interactive_non_modal:
    111         job.run_if_interactive_non_modal()

~/dev_sam/pyiron_base/pyiron_base/master/parallel.py in run_if_interactive(self)
    701             for parameter in self._job_generator.parameter_list:
    702                 self._job_generator.modify_job(job=self.ref_job, parameter=parameter)
--> 703                 self.ref_job.run()
    704             self.ref_job.interactive_close()
    705         else:

~/dev_sam/pyiron_base/pyiron_base/generic/util.py in decorated(*args, **kwargs)
    211                         stacklevel=2,
    212                     )
--> 213             return function(*args, **kwargs)
    214 
    215         return decorated

~/dev_sam/pyiron_base/pyiron_base/job/generic.py in run(self, delete_existing_job, repair, debug, run_mode, run_again)
    726                 self._run_if_repair()
    727             elif status == "initialized":
--> 728                 self._run_if_new(debug=debug)
    729             elif status == "created":
    730                 self._run_if_created()

~/dev_sam/pyiron_base/pyiron_base/job/generic.py in _run_if_new(self, debug)
   1253             debug (bool): Debug Mode
   1254         """
-> 1255         run_job_with_status_initialized(job=self, debug=debug)
   1256 
   1257     def _run_if_created(self):

~/dev_sam/pyiron_base/pyiron_base/job/runfunction.py in run_job_with_status_initialized(job, debug)
     74     else:
     75         job.save()
---> 76         job.run()
     77 
     78 

~/dev_sam/pyiron_base/pyiron_base/generic/util.py in decorated(*args, **kwargs)
    211                         stacklevel=2,
    212                     )
--> 213             return function(*args, **kwargs)
    214 
    215         return decorated

~/dev_sam/pyiron_base/pyiron_base/job/generic.py in run(self, delete_existing_job, repair, debug, run_mode, run_again)
    728                 self._run_if_new(debug=debug)
    729             elif status == "created":
--> 730                 self._run_if_created()
    731             elif status == "submitted":
    732                 run_job_with_status_submitted(job=self)

~/dev_sam/pyiron_base/pyiron_base/job/generic.py in _run_if_created(self)
   1264             int: Queue ID - if the job was send to the queue
   1265         """
-> 1266         return run_job_with_status_created(job=self)
   1267 
   1268     def _run_if_repair(self):

~/dev_sam/pyiron_base/pyiron_base/job/runfunction.py in run_job_with_status_created(job)
    107         job.run_if_scheduler()
    108     elif job.server.run_mode.interactive:
--> 109         job.run_if_interactive()
    110     elif job.server.run_mode.interactive_non_modal:
    111         job.run_if_interactive_non_modal()

~/dev_sam/pyiron_atomistics/pyiron_atomistics/lammps/interactive.py in run_if_interactive(self)
    434 
    435         else:
--> 436             super(LammpsInteractive, self).run_if_interactive()
    437             self.interactive_execute()
    438             self.interactive_collect()

~/dev_sam/pyiron_atomistics/pyiron_atomistics/atomistics/job/interactive.py in run_if_interactive(self)
    123             raise ValueError("Input structure not set. Use method set_structure()")
    124         if not self.interactive_is_activated():
--> 125             self.interactive_initialize_interface()
    126         if self._structure_previous is None:
    127             self._structure_previous = self.structure.copy()

~/dev_sam/pyiron_atomistics/pyiron_atomistics/lammps/interactive.py in interactive_initialize_interface(self)
    233             if self._log_file is None:
    234                 self._log_file = os.path.join(self.working_directory, "log.lammps")
--> 235             self._interactive_library = lammps(
    236                 cmdargs=["-screen", "none", "-log", self._log_file]
    237             )

/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/lammps/core.py in __init__(self, name, cmdargs, ptr, comm)
    145         else:
    146           libpath = "liblammps" + lib_ext
--> 147       self.lib = CDLL(libpath,RTLD_GLOBAL)
    148 
    149     # declare all argument and return types for all library methods here.

/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/ctypes/__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error, winmode)
    371 
    372         if handle is None:
--> 373             self._handle = _dlopen(self._name, mode)
    374         else:
    375             self._handle = handle

OSError: libmpi.so.40: cannot open shared object file: No such file or directory
samwaseda commented 2 years ago

And the environment where the problem does not occur is this one

tnecnivkcots commented 2 years ago

I tried to find libmpi.so.40 with find -depth -name libmpi.so.40, but it is not there and I have any idea, why it is not there, as it should be there.

tnecnivkcots commented 2 years ago

Probably it is an issue with the dependencies.

tnecnivkcots commented 2 years ago

@pmrv Do you have any idea?

pmrv commented 2 years ago

I actually cannot find any mpi related files, so I'm guessing conda/mamba messed something up, because it is still listed in mamba list. I've tried to install it again with mamba install -c conda-forge mpi=1.0=openmpi, but it fails because the file is own by @tnecnivkcots . getfacl reveals that the pyiron group has rw access, but also that the mpd group only has r. I'm guessing this somehow takes precedence.

tnecnivkcots commented 2 years ago

Which file exactly are you talking about, with this ACL-settings?

If I try to execute mamba install -c conda-forge mpi=1.0=openmpi I get the following error:

# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

    Traceback (most recent call last):
      File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packag                                                                                                                                                             es/conda/exceptions.py", line 1114, in __call__
        return func(*args, **kwargs)
      File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packag                                                                                                                                                             es/mamba/mamba.py", line 935, in exception_converter
        raise e
      File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packag                                                                                                                                                             es/mamba/mamba.py", line 929, in exception_converter
        exit_code = _wrapped_main(*args, **kwargs)
      File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packag                                                                                                                                                             es/mamba/mamba.py", line 887, in _wrapped_main
        result = do_call(args, p)
      File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packag                                                                                                                                                             es/mamba/mamba.py", line 750, in do_call
        exit_code = install(args, parser, "install")
      File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packag                                                                                                                                                             es/mamba/mamba.py", line 497, in install
        index = load_channels(pool, channels, repos)
      File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packag                                                                                                                                                             es/mamba/utils.py", line 129, in load_channels
        index = get_index(
      File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packag                                                                                                                                                             es/mamba/utils.py", line 110, in get_index
        is_downloaded = dlist.download(api.MAMBA_DOWNLOAD_FAILFAST)
    RuntimeError: Operation not permitted: '/u/system/SLES12/soft/pyiron/dev/ana                                                                                                                                                             conda3/pkgs/cache/2ce54b42.json'

`$ /u/system/SLES12/soft/pyiron/dev/anaconda3//bin/mamba install -c conda-forge                                                                                                                                                              mpi=1.0=openmpi`

  environment variables:
                 CIO_TEST=<not set>
             CONDA_PREFIX=/u/system/SLES12/soft/pyiron/dev/anaconda3/
               CONDA_ROOT=/u/system/SLES12/soft/pyiron/dev/anaconda3
           CURL_CA_BUNDLE=<not set>
          GPAW_SETUP_PATH=/u/system/SLES12/soft/pyiron/dev/pyiron-resources-
                          cmmc/gpaw/potentials/gpaw-setups-0.9.20000
                  MANPATH=/mpcdf/soft/SLE_15/packages/x86_64/Modules/5.0.1/share                                                                                                                                                             /man:/usr/local/
                          man:/usr/share/man
               MODULEPATH=/mpcdf/soft/SLE_15/modules/third-party-compilers:/mpcd                                                                                                                                                             f/soft/SLE_15/mo
                          dules/java:/mpcdf/soft/SLE_15/modules/visualization:/m                                                                                                                                                             pcdf/soft/SLE_15
                          /modules/gpu:/mpcdf/soft/SLE_15/modules/ml:/mpcdf/soft                                                                                                                                                             /SLE_15/modules/
                          applications:/mpcdf/soft/SLE_15/modules/compilers:/mpc                                                                                                                                                             df/soft/SLE_15/m
                          odules/python:/mpcdf/soft/SLE_15/modules/libs:/mpcdf/s                                                                                                                                                             oft/SLE_15/modul
                          es/tools:/cmmc/system_sle15_sp1/modules.addon/CMMC
                     PATH=/u/system/SLES12/soft/pyiron/dev/anaconda3//bin:/mpcdf                                                                                                                                                             /soft/SLE_15/pac
                          kages/x86_64/Modules/5.0.1/bin:/u/vistock/bin:/usr/loc                                                                                                                                                             al/bin:/usr/bin:
                          /bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/afs/ipp/amd64                                                                                                                                                             _sles15/bin:/mpc
                          df/soft/SLE_15/packages/x86_64/find-module/1.0/bin
               PYTHONPATH=/u/system/SLES12/soft/pyiron/dev/pyiron_mpie/pyiron:/u                                                                                                                                                             /system/SLES12/s
                          oft/pyiron/dev/pyiron_mpie:/u/system/SLES12/soft/pyiro                                                                                                                                                             n/dev/pyiron_bac
                          kwards
       REQUESTS_CA_BUNDLE=<not set>
            SSL_CERT_FILE=<not set>
                 XNLSPATH=/usr/X11R6/lib/X11/nls
__MODULES_SHARE_MODULEPATH=/mpcdf/soft/SLE_15/modules/ml:2:/mpcdf/soft/SLE_15/mo                                                                                                                                                             dules/tools:2:/mp
                          cdf/soft/SLE_15/modules/java:2:/mpcdf/soft/SLE_15/modu                                                                                                                                                             les/visualizatio
                          n:2:/mpcdf/soft/SLE_15/modules/compilers:2:/mpcdf/soft                                                                                                                                                             /SLE_15/modules/
                          python:2:/mpcdf/soft/SLE_15/modules/libs:2:/mpcdf/soft                                                                                                                                                             /SLE_15/modules/
                          third-party-compilers:2:/mpcdf/soft/SLE_15/modules/app                                                                                                                                                             lications:2:/mpc
                          df/soft/SLE_15/modules/gpu:2

     active environment : base
    active env location : /u/system/SLES12/soft/pyiron/dev/anaconda3/
       user config file : /u/vistock/.condarc
 populated config files : /u/system/SLES12/soft/pyiron/dev/anaconda3/.condarc
                          /u/vistock/.condarc
          conda version : 4.13.0
    conda-build version : 3.21.9
         python version : 3.8.13.final.0
       virtual packages : __linux=5.3.18=0
                          __glibc=2.31=0
                          __unix=0=0
                          __archspec=1=x86_64
       base environment : /u/system/SLES12/soft/pyiron/dev/anaconda3  (writable)
      conda av data dir : /u/system/SLES12/soft/pyiron/dev/anaconda3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://repo.anaconda.com/pkgs/main/linux-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/linux-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://conda.anaconda.org/intel/linux-64
                          https://conda.anaconda.org/intel/noarch
          package cache : /u/system/SLES12/soft/pyiron/dev/anaconda3/pkgs
                          /u/vistock/.conda/pkgs
       envs directories : /u/system/SLES12/soft/pyiron/dev/anaconda3/envs
                          /u/vistock/.conda/envs
               platform : linux-64
             user-agent : conda/4.13.0 requests/2.28.1 CPython/3.8.13 Linux/5.3.                                                                                                                                                             18-150300.59.60-default sles/15.3 glibc/2.31
                UID:GID : 35139:12500
             netrc file : None
           offline mode : False

An unexpected error has occurred. Conda has prepared the above report.

and if I look for the ACL at the .json file mentioned in the error, I get :

getfacl: Removing leading '/' from absolute path names
# file: u/system/SLES12/soft/pyiron/dev/anaconda3/pkgs/cache/47929eba.json
# owner: zora
# group: mpd
user::rw-
group::r-x                      #effective:r--
group:pyiron:rwx                #effective:rw-
mask::rw-
other::r--

The problems are that effective no group has execution permissions.

pmrv commented 2 years ago

It was file owned by you in the same folder. Execute permissions should not matter for reading files, right? Anyway I've deleted the files owned by me, maybe it'll work now for you.

tnecnivkcots commented 2 years ago

Yeah, thank you for cleaning up. But now I get, everything would be already installed.

Looking for: ['mpi==1.0=openmpi']

pkgs/main/noarch                                   811.3kB @   2.4MB/s  0.3s
pkgs/r/linux-64                                      1.4MB @   3.0MB/s  0.5s
pkgs/r/noarch                                        1.3MB @   2.1MB/s  0.3s
intel/noarch                                                  No change
pkgs/main/linux-64                                   4.6MB @   4.6MB/s  1.0s
intel/linux-64                                                No change
conda-forge/noarch                                   8.7MB @   4.3MB/s  2.1s
conda-forge/linux-64                                24.2MB @   4.6MB/s  5.5s

Pinned packages:
  - python 3.8.*
  - libblas * *mkl
  - blas * *mkl
  - jupyterhub 2.0.0.*

Transaction

  Prefix: /u/system/SLES12/soft/pyiron/dev/anaconda3/

  All requested packages already installed
pmrv commented 2 years ago

Have you tried with --force-reinstall?

tnecnivkcots commented 2 years ago

No, but I uninstalled it and installed it again, with the other packages, which were uninstalled. Now I execute the compile script.

tnecnivkcots commented 2 years ago

Ok, should be installed, compiled and so on now. @samwaseda can you try it another time?

pmrv commented 2 years ago

I've checked just now, but the libmpi.so.40 file is still not present.

tnecnivkcots commented 2 years ago

Now I have executed the mamba install mpi ... --force-reinstall and compiled it again. The file is still not there. (I expected it in /u/system/soft/pyiron/dev/anaconda3/lib.)

tnecnivkcots commented 2 years ago

We can try to download it from pkgs.org and put it into the expected directory. Does it have to be linked somewhere?

niklassiemer commented 2 years ago

I am somehow glad, I tried to update now, without someone being on holidays...

tnecnivkcots commented 2 years ago

I am still everytime afraid of destroying this conda environment, which seems to be fragile.

niklassiemer commented 2 years ago

I just had a look at the diff between the environments and yesterday (with my update) openmpi changed:

<   - openmpi=4.1.4=external_0
---
>   - openmpi=4.1.4=ha1ae619_100

I am of course not sure if that is related...

tnecnivkcots commented 2 years ago

I think it is not related to the fact, you made the update. I am sure it is related to doing updates in general.

The external_0 is also there:

conda list openmpi
# packages in environment at /u/system/SLES12/soft/pyiron/dev/anaconda3/:
#
# Name                    Version                   Build  Channel
openmpi                   4.1.4                external_0    conda-forge
tnecnivkcots commented 2 years ago

The build of openmpi from last week also was ha1ae619_100.

tnecnivkcots commented 2 years ago

All the more suprising that the file disappeared. I expect lammps worked before yesterday. Am I right?

niklassiemer commented 2 years ago

The build of openmpi from last week also was ha1ae619_100.

That's my point!

22-06-29/env_before.yml:  - openmpi=4.1.4=ha1ae619_100
22-06-29/env_after.yml:  - openmpi=4.1.4=ha1ae619_100
22-07-06/env_before.yml:  - openmpi=4.1.4=ha1ae619_100
22-07-06/env_after.yml:  - openmpi=4.1.4=ha1ae619_100
22-07-13/env_before.yml:  - openmpi=4.1.4=ha1ae619_100
22-07-13/env_after.yml:  - openmpi=4.1.4=external_0
22-07-14/env_after.yml:  - openmpi=4.1.4=external_0
22-07-14/env_before.yml:  - openmpi=4.1.4=external_0
right now:   - openmpi=4.1.4=external_0

I.e. with the build number ha1ae619_100 it all seems to be ok?!

niklassiemer commented 2 years ago

I will now change that.

tnecnivkcots commented 2 years ago

Ahhh, ok now I understand.

niklassiemer commented 2 years ago

Seems to be fixed! I will now make some cluster unit tests to verify the environment after update...

tnecnivkcots commented 2 years ago

Ok, in this case it would be very efficient to have some testings after every update. Thank you @niklassiemer.

samwaseda commented 2 years ago

Hey it worked! Thanks!