Closed samwaseda closed 2 years ago
That one should be included in conda, right? I made an update (using the usual update script of @tnecnivkcots ) today. Hopefully, there is not an issue with permissions...
I run the conda update another time now. Does this issue persist?
I run the conda update another time now. Does this issue persist?
Yes, but actually in the collaborative binder setup there's no problem. Does it mean there's a problem with my cluster setup, or is it because it's not built new on binder every time? Sorry I'm probably asking a novice question...
Which binder setup are you talking about? Before I expected you are talking about the jupyterhub on cmti. Comparing that with the collaborative binder, which is started from a repository is not so easy, because of the separated environment on the kubernetes cluster at MPCDF, where the binder lives on.
Additionally there is no need to be sorry. We just have different backgrounds and that is fine.
But on which machine does this issue occure? Just to have a shared context.
Oh sorry only now I realized that I didn't even say where the problem occurred XD I post the entire error:
OSError Traceback (most recent call last)
/tmp/ipykernel_4559/3425482066.py in <cell line: 24>()
22 wf = ElasticConstants(pr, 'elastic_tensor')
23
---> 24 plt.imshow(wf.elastic_constants)
/tmp/ipykernel_4559/3150592139.py in __getattr__(self, attr)
16 return getattr(self, '_' + attr)
17 except AttributeError:
---> 18 args = [getattr(self, a) for a in inspect.getfullargspec(self.__getattribute__('get_' + attr)).args[1:]]
19 setattr(self, '_' + attr, self.__getattribute__('get_' + attr)(*args))
20 return getattr(self, '_' + attr)
/tmp/ipykernel_4559/3150592139.py in <listcomp>(.0)
16 return getattr(self, '_' + attr)
17 except AttributeError:
---> 18 args = [getattr(self, a) for a in inspect.getfullargspec(self.__getattribute__('get_' + attr)).args[1:]]
19 setattr(self, '_' + attr, self.__getattribute__('get_' + attr)(*args))
20 return getattr(self, '_' + attr)
/tmp/ipykernel_4559/3150592139.py in __getattr__(self, attr)
17 except AttributeError:
18 args = [getattr(self, a) for a in inspect.getfullargspec(self.__getattribute__('get_' + attr)).args[1:]]
---> 19 setattr(self, '_' + attr, self.__getattribute__('get_' + attr)(*args))
20 return getattr(self, '_' + attr)
/tmp/ipykernel_4559/3425482066.py in get_lattice_constant(self, element)
7 job.interactive_open()
8 murn = job.create_job('Murnaghan', job.job_name.replace('lmp_', ''))
----> 9 murn.run()
10 return murn['output/equilibrium_volume']**(1 / 3)
11
~/dev_sam/pyiron_base/pyiron_base/generic/util.py in decorated(*args, **kwargs)
211 stacklevel=2,
212 )
--> 213 return function(*args, **kwargs)
214
215 return decorated
~/dev_sam/pyiron_base/pyiron_base/job/generic.py in run(self, delete_existing_job, repair, debug, run_mode, run_again)
726 self._run_if_repair()
727 elif status == "initialized":
--> 728 self._run_if_new(debug=debug)
729 elif status == "created":
730 self._run_if_created()
~/dev_sam/pyiron_base/pyiron_base/job/generic.py in _run_if_new(self, debug)
1253 debug (bool): Debug Mode
1254 """
-> 1255 run_job_with_status_initialized(job=self, debug=debug)
1256
1257 def _run_if_created(self):
~/dev_sam/pyiron_base/pyiron_base/job/runfunction.py in run_job_with_status_initialized(job, debug)
74 else:
75 job.save()
---> 76 job.run()
77
78
~/dev_sam/pyiron_base/pyiron_base/generic/util.py in decorated(*args, **kwargs)
211 stacklevel=2,
212 )
--> 213 return function(*args, **kwargs)
214
215 return decorated
~/dev_sam/pyiron_base/pyiron_base/job/generic.py in run(self, delete_existing_job, repair, debug, run_mode, run_again)
728 self._run_if_new(debug=debug)
729 elif status == "created":
--> 730 self._run_if_created()
731 elif status == "submitted":
732 run_job_with_status_submitted(job=self)
~/dev_sam/pyiron_base/pyiron_base/job/generic.py in _run_if_created(self)
1264 int: Queue ID - if the job was send to the queue
1265 """
-> 1266 return run_job_with_status_created(job=self)
1267
1268 def _run_if_repair(self):
~/dev_sam/pyiron_base/pyiron_base/job/runfunction.py in run_job_with_status_created(job)
107 job.run_if_scheduler()
108 elif job.server.run_mode.interactive:
--> 109 job.run_if_interactive()
110 elif job.server.run_mode.interactive_non_modal:
111 job.run_if_interactive_non_modal()
~/dev_sam/pyiron_base/pyiron_base/master/parallel.py in run_if_interactive(self)
701 for parameter in self._job_generator.parameter_list:
702 self._job_generator.modify_job(job=self.ref_job, parameter=parameter)
--> 703 self.ref_job.run()
704 self.ref_job.interactive_close()
705 else:
~/dev_sam/pyiron_base/pyiron_base/generic/util.py in decorated(*args, **kwargs)
211 stacklevel=2,
212 )
--> 213 return function(*args, **kwargs)
214
215 return decorated
~/dev_sam/pyiron_base/pyiron_base/job/generic.py in run(self, delete_existing_job, repair, debug, run_mode, run_again)
726 self._run_if_repair()
727 elif status == "initialized":
--> 728 self._run_if_new(debug=debug)
729 elif status == "created":
730 self._run_if_created()
~/dev_sam/pyiron_base/pyiron_base/job/generic.py in _run_if_new(self, debug)
1253 debug (bool): Debug Mode
1254 """
-> 1255 run_job_with_status_initialized(job=self, debug=debug)
1256
1257 def _run_if_created(self):
~/dev_sam/pyiron_base/pyiron_base/job/runfunction.py in run_job_with_status_initialized(job, debug)
74 else:
75 job.save()
---> 76 job.run()
77
78
~/dev_sam/pyiron_base/pyiron_base/generic/util.py in decorated(*args, **kwargs)
211 stacklevel=2,
212 )
--> 213 return function(*args, **kwargs)
214
215 return decorated
~/dev_sam/pyiron_base/pyiron_base/job/generic.py in run(self, delete_existing_job, repair, debug, run_mode, run_again)
728 self._run_if_new(debug=debug)
729 elif status == "created":
--> 730 self._run_if_created()
731 elif status == "submitted":
732 run_job_with_status_submitted(job=self)
~/dev_sam/pyiron_base/pyiron_base/job/generic.py in _run_if_created(self)
1264 int: Queue ID - if the job was send to the queue
1265 """
-> 1266 return run_job_with_status_created(job=self)
1267
1268 def _run_if_repair(self):
~/dev_sam/pyiron_base/pyiron_base/job/runfunction.py in run_job_with_status_created(job)
107 job.run_if_scheduler()
108 elif job.server.run_mode.interactive:
--> 109 job.run_if_interactive()
110 elif job.server.run_mode.interactive_non_modal:
111 job.run_if_interactive_non_modal()
~/dev_sam/pyiron_atomistics/pyiron_atomistics/lammps/interactive.py in run_if_interactive(self)
434
435 else:
--> 436 super(LammpsInteractive, self).run_if_interactive()
437 self.interactive_execute()
438 self.interactive_collect()
~/dev_sam/pyiron_atomistics/pyiron_atomistics/atomistics/job/interactive.py in run_if_interactive(self)
123 raise ValueError("Input structure not set. Use method set_structure()")
124 if not self.interactive_is_activated():
--> 125 self.interactive_initialize_interface()
126 if self._structure_previous is None:
127 self._structure_previous = self.structure.copy()
~/dev_sam/pyiron_atomistics/pyiron_atomistics/lammps/interactive.py in interactive_initialize_interface(self)
233 if self._log_file is None:
234 self._log_file = os.path.join(self.working_directory, "log.lammps")
--> 235 self._interactive_library = lammps(
236 cmdargs=["-screen", "none", "-log", self._log_file]
237 )
/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packages/lammps/core.py in __init__(self, name, cmdargs, ptr, comm)
145 else:
146 libpath = "liblammps" + lib_ext
--> 147 self.lib = CDLL(libpath,RTLD_GLOBAL)
148
149 # declare all argument and return types for all library methods here.
/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/ctypes/__init__.py in __init__(self, name, mode, handle, use_errno, use_last_error, winmode)
371
372 if handle is None:
--> 373 self._handle = _dlopen(self._name, mode)
374 else:
375 self._handle = handle
OSError: libmpi.so.40: cannot open shared object file: No such file or directory
I tried to find libmpi.so.40 with find -depth -name libmpi.so.40
, but it is not there and I have any idea, why it is not there, as it should be there.
Probably it is an issue with the dependencies.
@pmrv Do you have any idea?
I actually cannot find any mpi related files, so I'm guessing conda/mamba messed something up, because it is still listed in mamba list
. I've tried to install it again with mamba install -c conda-forge mpi=1.0=openmpi
, but it fails because the file is own by @tnecnivkcots . getfacl
reveals that the pyiron
group has rw
access, but also that the mpd
group only has r
. I'm guessing this somehow takes precedence.
Which file exactly are you talking about, with this ACL-settings?
If I try to execute mamba install -c conda-forge mpi=1.0=openmpi
I get the following error:
# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<
Traceback (most recent call last):
File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packag es/conda/exceptions.py", line 1114, in __call__
return func(*args, **kwargs)
File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packag es/mamba/mamba.py", line 935, in exception_converter
raise e
File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packag es/mamba/mamba.py", line 929, in exception_converter
exit_code = _wrapped_main(*args, **kwargs)
File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packag es/mamba/mamba.py", line 887, in _wrapped_main
result = do_call(args, p)
File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packag es/mamba/mamba.py", line 750, in do_call
exit_code = install(args, parser, "install")
File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packag es/mamba/mamba.py", line 497, in install
index = load_channels(pool, channels, repos)
File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packag es/mamba/utils.py", line 129, in load_channels
index = get_index(
File "/u/system/SLES12/soft/pyiron/dev/anaconda3/lib/python3.8/site-packag es/mamba/utils.py", line 110, in get_index
is_downloaded = dlist.download(api.MAMBA_DOWNLOAD_FAILFAST)
RuntimeError: Operation not permitted: '/u/system/SLES12/soft/pyiron/dev/ana conda3/pkgs/cache/2ce54b42.json'
`$ /u/system/SLES12/soft/pyiron/dev/anaconda3//bin/mamba install -c conda-forge mpi=1.0=openmpi`
environment variables:
CIO_TEST=<not set>
CONDA_PREFIX=/u/system/SLES12/soft/pyiron/dev/anaconda3/
CONDA_ROOT=/u/system/SLES12/soft/pyiron/dev/anaconda3
CURL_CA_BUNDLE=<not set>
GPAW_SETUP_PATH=/u/system/SLES12/soft/pyiron/dev/pyiron-resources-
cmmc/gpaw/potentials/gpaw-setups-0.9.20000
MANPATH=/mpcdf/soft/SLE_15/packages/x86_64/Modules/5.0.1/share /man:/usr/local/
man:/usr/share/man
MODULEPATH=/mpcdf/soft/SLE_15/modules/third-party-compilers:/mpcd f/soft/SLE_15/mo
dules/java:/mpcdf/soft/SLE_15/modules/visualization:/m pcdf/soft/SLE_15
/modules/gpu:/mpcdf/soft/SLE_15/modules/ml:/mpcdf/soft /SLE_15/modules/
applications:/mpcdf/soft/SLE_15/modules/compilers:/mpc df/soft/SLE_15/m
odules/python:/mpcdf/soft/SLE_15/modules/libs:/mpcdf/s oft/SLE_15/modul
es/tools:/cmmc/system_sle15_sp1/modules.addon/CMMC
PATH=/u/system/SLES12/soft/pyiron/dev/anaconda3//bin:/mpcdf /soft/SLE_15/pac
kages/x86_64/Modules/5.0.1/bin:/u/vistock/bin:/usr/loc al/bin:/usr/bin:
/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin:/afs/ipp/amd64 _sles15/bin:/mpc
df/soft/SLE_15/packages/x86_64/find-module/1.0/bin
PYTHONPATH=/u/system/SLES12/soft/pyiron/dev/pyiron_mpie/pyiron:/u /system/SLES12/s
oft/pyiron/dev/pyiron_mpie:/u/system/SLES12/soft/pyiro n/dev/pyiron_bac
kwards
REQUESTS_CA_BUNDLE=<not set>
SSL_CERT_FILE=<not set>
XNLSPATH=/usr/X11R6/lib/X11/nls
__MODULES_SHARE_MODULEPATH=/mpcdf/soft/SLE_15/modules/ml:2:/mpcdf/soft/SLE_15/mo dules/tools:2:/mp
cdf/soft/SLE_15/modules/java:2:/mpcdf/soft/SLE_15/modu les/visualizatio
n:2:/mpcdf/soft/SLE_15/modules/compilers:2:/mpcdf/soft /SLE_15/modules/
python:2:/mpcdf/soft/SLE_15/modules/libs:2:/mpcdf/soft /SLE_15/modules/
third-party-compilers:2:/mpcdf/soft/SLE_15/modules/app lications:2:/mpc
df/soft/SLE_15/modules/gpu:2
active environment : base
active env location : /u/system/SLES12/soft/pyiron/dev/anaconda3/
user config file : /u/vistock/.condarc
populated config files : /u/system/SLES12/soft/pyiron/dev/anaconda3/.condarc
/u/vistock/.condarc
conda version : 4.13.0
conda-build version : 3.21.9
python version : 3.8.13.final.0
virtual packages : __linux=5.3.18=0
__glibc=2.31=0
__unix=0=0
__archspec=1=x86_64
base environment : /u/system/SLES12/soft/pyiron/dev/anaconda3 (writable)
conda av data dir : /u/system/SLES12/soft/pyiron/dev/anaconda3/etc/conda
conda av metadata url : None
channel URLs : https://conda.anaconda.org/conda-forge/linux-64
https://conda.anaconda.org/conda-forge/noarch
https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch
https://conda.anaconda.org/intel/linux-64
https://conda.anaconda.org/intel/noarch
package cache : /u/system/SLES12/soft/pyiron/dev/anaconda3/pkgs
/u/vistock/.conda/pkgs
envs directories : /u/system/SLES12/soft/pyiron/dev/anaconda3/envs
/u/vistock/.conda/envs
platform : linux-64
user-agent : conda/4.13.0 requests/2.28.1 CPython/3.8.13 Linux/5.3. 18-150300.59.60-default sles/15.3 glibc/2.31
UID:GID : 35139:12500
netrc file : None
offline mode : False
An unexpected error has occurred. Conda has prepared the above report.
and if I look for the ACL at the .json file mentioned in the error, I get :
getfacl: Removing leading '/' from absolute path names
# file: u/system/SLES12/soft/pyiron/dev/anaconda3/pkgs/cache/47929eba.json
# owner: zora
# group: mpd
user::rw-
group::r-x #effective:r--
group:pyiron:rwx #effective:rw-
mask::rw-
other::r--
The problems are that effective no group has execution permissions.
It was file owned by you in the same folder. Execute permissions should not matter for reading files, right? Anyway I've deleted the files owned by me, maybe it'll work now for you.
Yeah, thank you for cleaning up. But now I get, everything would be already installed.
Looking for: ['mpi==1.0=openmpi']
pkgs/main/noarch 811.3kB @ 2.4MB/s 0.3s
pkgs/r/linux-64 1.4MB @ 3.0MB/s 0.5s
pkgs/r/noarch 1.3MB @ 2.1MB/s 0.3s
intel/noarch No change
pkgs/main/linux-64 4.6MB @ 4.6MB/s 1.0s
intel/linux-64 No change
conda-forge/noarch 8.7MB @ 4.3MB/s 2.1s
conda-forge/linux-64 24.2MB @ 4.6MB/s 5.5s
Pinned packages:
- python 3.8.*
- libblas * *mkl
- blas * *mkl
- jupyterhub 2.0.0.*
Transaction
Prefix: /u/system/SLES12/soft/pyiron/dev/anaconda3/
All requested packages already installed
Have you tried with --force-reinstall
?
No, but I uninstalled it and installed it again, with the other packages, which were uninstalled. Now I execute the compile script.
Ok, should be installed, compiled and so on now. @samwaseda can you try it another time?
I've checked just now, but the libmpi.so.40
file is still not present.
Now I have executed the mamba install mpi ... --force-reinstall
and compiled it again. The file is still not there. (I expected it in /u/system/soft/pyiron/dev/anaconda3/lib
.)
We can try to download it from pkgs.org and put it into the expected directory. Does it have to be linked somewhere?
I am somehow glad, I tried to update now, without someone being on holidays...
I am still everytime afraid of destroying this conda environment, which seems to be fragile.
I just had a look at the diff between the environments and yesterday (with my update) openmpi changed:
< - openmpi=4.1.4=external_0
---
> - openmpi=4.1.4=ha1ae619_100
I am of course not sure if that is related...
I think it is not related to the fact, you made the update. I am sure it is related to doing updates in general.
The external_0
is also there:
conda list openmpi
# packages in environment at /u/system/SLES12/soft/pyiron/dev/anaconda3/:
#
# Name Version Build Channel
openmpi 4.1.4 external_0 conda-forge
The build of openmpi from last week also was ha1ae619_100
.
All the more suprising that the file disappeared. I expect lammps worked before yesterday. Am I right?
The build of openmpi from last week also was
ha1ae619_100
.
That's my point!
22-06-29/env_before.yml: - openmpi=4.1.4=ha1ae619_100
22-06-29/env_after.yml: - openmpi=4.1.4=ha1ae619_100
22-07-06/env_before.yml: - openmpi=4.1.4=ha1ae619_100
22-07-06/env_after.yml: - openmpi=4.1.4=ha1ae619_100
22-07-13/env_before.yml: - openmpi=4.1.4=ha1ae619_100
22-07-13/env_after.yml: - openmpi=4.1.4=external_0
22-07-14/env_after.yml: - openmpi=4.1.4=external_0
22-07-14/env_before.yml: - openmpi=4.1.4=external_0
right now: - openmpi=4.1.4=external_0
I.e. with the build number ha1ae619_100
it all seems to be ok?!
I will now change that.
Ahhh, ok now I understand.
Seems to be fixed! I will now make some cluster unit tests to verify the environment after update...
Ok, in this case it would be very efficient to have some testings after every update. Thank you @niklassiemer.
Hey it worked! Thanks!
Sorry it's an MPIE problem, but LAMMPS doesn't seem to be able to find the shared library for the interactive job: