Closed totaibi closed 2 years ago
Hi @totaibi,
I believe the issue is arising due to recent changes in how passive tracers are created and then accessed in UWGeodynamics.
Instead of:
surface_tracers = Model.add_passive_tracers(name="Surface", vertices=coords)
it should be:
Model.add_passive_tracers(name="Surface", vertices=coords)
Then to track different fields it should be:
Model.Surface_tracers.add_tracked_field(Model.pressureField,
name="tracers_press",
units=u.megapascal,
dataType="float")
This is outlined in the newest documentation (https://github.com/underworldcode/UWGeodynamics/blob/485ad28cee2b9373a2c9673587e49a96c2e14150/docs/readthedocs/src/UserGuide.rst)
So your code will look like this:
Model.add_passive_tracers(name="Surface", vertices=coords)
coords[:,2] = GEO.nd(-20.*u.km)
Model.add_passive_tracers(name="BD", vertices=coords)
coords[:,2] = GEO.nd(-35.*u.km)
Model.add_passive_tracers(name="Moho", vertices=coords)
coords[:,2] = GEO.nd(-90.*u.km)
Model.add_passive_tracers(name="Litho", vertices=coords)
Model.Surface_tracers.add_tracked_field(Model.strainRateField,
name="surface_strainRate",
units=1.0/u.second, dataType="float")
Model.BD_tracers.add_tracked_field(Model.strainRateField,
name="BD_strainRate",
units=1.0/u.second, dataType="float")
Model.Moho_tracers.add_tracked_field(Model.strainRateField,
name="moho_strainRate",
units=1.0/u.second, dataType="float")
Model.Litho_tracers.add_tracked_field(Model.strainRateField,
name="LM_strainRate",
units=1.0/u.second, dataType="float")
Let me know how you get on.
Cheers
Hi @bknight1,
Thank you for your support. The passive tracers problem has been resolved based on your comment. I'm still getting the same message for checkpoint:
Traceback (most recent call last):
File "Harrat.py", line 527, in <module>
Model.run_for(40.0 * u.megayear, checkpoint_interval=1. * u.megayear)
File "/opt/venv/lib/python3.8/site-packages/UWGeodynamics/_model.py", line 1613, in run_for
checkpointer = _CheckpointFunction(
File "/opt/venv/lib/python3.8/site-packages/UWGeodynamics/_model.py", line 2335, in __init__
self.checkpoint_all()
File "/opt/venv/lib/python3.8/site-packages/UWGeodynamics/_model.py", line 2416, in checkpoint_all
self.checkpoint_tracers(tracers, checkpointID, time, outputDir)
File "/opt/venv/lib/python3.8/site-packages/UWGeodynamics/_model.py", line 2612, in checkpoint_tracers
item.save(outputDir, checkpointID, time)
File "/opt/venv/lib/python3.8/site-packages/UWGeodynamics/_utils.py", line 191, in save
with h5py.File(name=swarm_fpath, mode="r") as h5f:
File "/opt/venv/lib/python3.8/site-packages/h5py/_hl/files.py", line 444, in __init__
fid = make_fid(name, mode, userblock_size,
File "/opt/venv/lib/python3.8/site-packages/h5py/_hl/files.py", line 199, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 100, in h5py.h5f.open
OSError: Unable to open file (file signature not found)
Do you have any suggestion?
Thamer
Hi @totaibi
What version of UW and UWGeodynamics are you using? There might be an issue if you're using UW v2.11 and UWGeodynamics v2.10.x
I installed the latest version of both packages week ago via docker
To make sure, you can check using the following steps on the HPC system.
module load singularity
singularity shell uwgeodynamics_latest.sif (change to your singularity image name) - this will open the singularity container
python3 - this will load a python interactive shell
import UWGeodynamics as GEO
GEO.__version__ - this prints the UWGeodynamics version
import underworld as UW
UW.__version__ - this prints the Underworld version
Thanks @bknight1
The UWGeodynamics version: 2.11.0-dev-485ad28(master)
The UW version: 2.11.0b
I am using the same version on my local machine through docker and it works fine for running 2D and 3D models and is able to save the passive tracers fine. I'd recommend trying the script on a local machine using the same version from docker to see if it is an issue with Singularity. If the same issue occurs when running via docker locally then the only thing I can think of otherwise is due to a duplicate variable name.
Thanks again @bknight1 for your feedback and suggestion. I realized that either the file is corrupted or not in HDF5 format. Do you have any suggestions?
The loaded modules are singularity, python, and HDF5.
I'm not sure why the file is becoming corrupted @totaibi.
Did you try it on a local machine to see if you could recreate the issue?
Both UWGeodynamics and UW (the same version as in the HPC) are working fine on my local machine
I thought we could be missing loading one of the modules!?
One additional note, our HPC nodes are mounted using LUSTRE with flock option
Okay sounds like a singularity issue then.... I don't have much experience using singularity unfortunately, I'll have to pass the issue on to @julesghub or @rbeucher
Thanks @bknight1
Here is the error message for corrupted file:
Traceback (most recent call last):
File "Harrat.py", line 526, in <module>
Model.run_for(40.0*u.megayears, checkpoint_interval=1.0*u.megayears)
File "/opt/venv/lib/python3.8/site-packages/UWGeodynamics/_model.py", line 1613, in run_for
checkpointer = _CheckpointFunction(
File "/opt/venv/lib/python3.8/site-packages/UWGeodynamics/_model.py", line 2335, in __init__
self.checkpoint_all()
File "/opt/venv/lib/python3.8/site-packages/UWGeodynamics/_model.py", line 2416, in checkpoint_all
self.checkpoint_tracers(tracers, checkpointID, time, outputDir)
File "/opt/venv/lib/python3.8/site-packages/UWGeodynamics/_model.py", line 2612, in checkpoint_tracers
item.save(outputDir, checkpointID, time)
File "/opt/venv/lib/python3.8/site-packages/UWGeodynamics/_utils.py", line 191, in save
with h5py.File(name=swarm_fpath, mode="r") as h5f:
File "/opt/venv/lib/python3.8/site-packages/h5py/_hl/files.py", line 444, in __init__
fid = make_fid(name, mode, userblock_size,
File "/opt/venv/lib/python3.8/site-packages/h5py/_hl/files.py", line 199, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 100, in h5py.h5f.open
OSError: Unable to open file (corrupt object header - incorrect # of messages)
I want to following up. The latest version has been re-installed today on our HPC, unfortunately we still getting the same issue.
During the installation I got the following message, I thought it may guides us to the solution!
info unpack layer: sha256:c549ccf8d472c3bce9ce02e49c62b8f6cbc736ea2b8ba812a1ae9390c69d0b71
warn xattr{etc/gshadow} ignoring ENOTSUP on setxattr "user.rootlesscontainers"
warn xattr{/tmp/build-temp-473332333/rootfs/etc/gshadow} destination filesystem does not support xattrs, further warnings will be suppressed
Any suggestions?
Thamer
I'm not sure what the error is unfortunately.
I had a thought though, where are you trying to store the data? e.g. what is the Model.outputDir directory? This may cause an issue if you can't access the path from within docker.
This appears as a h5py issue when writing swarm variables to disk.
Good suggestion @bknight1 about checking if the path outputDir
is accessible?
The path is accessible, we figured out the problem may caused by the parallel computing, as the code is running on a single node. We are using srun at this stage, we will keep you posted
Thanks @bknight1 @julesghub for your continued support
Hi @totaibi, any news on this issue?
Hi @julesghub, thanks for following up. The problem is still not solved yet. The code works fine on a single node, but it does not across nodes or racks due to the file signature problem as reported earlier.
The HPC supporting team have no idea about what causing this issue.
The problem could be with the current version of UWGeodynamics. How to install an earlier version? I need the key name that I can use with docker pull command
Hi @totaibi,
To get a previous version of the UWGeodynamics docker (singularity) use.
docker pull underworldcode/uwgeodynamics:v2.10.2
see all available dockers with this link
https://hub.docker.com/repository/registry-1.docker.io/underworldcode/uwgeodynamics/tags?page=1&ordering=last_updated&name=v2
I understand you're using Singularity on a HPC? Which HPC? Do you have a link? As previously mentioned the Singularity + h5py combination seems to be the problem. A filesystem issue that I've not seen before. A likely workaround around would be to install the code "bare metal" on the HPC. I can help out with that if you like. But singularity is preferred.
Hi @julesghub,
Thank you for following up
I used an earlier version and got the same issue.
We are using Singularity with SANAM HPC, which is belongs to KACST (Saudi research institution). Unfortunately I could not find any link for it. The HPC is in the expansion stage, so I may raise another issue to setup the bare metal setting on it
Thank you again and the rest of UWGeodynamics team
Regards
Hello UWGeodynamics team,
I ahve installed the UWGeodynamics on our HPC with the lastest version (2.11). Our code is running on the earlier versions of UWGeodynamics, however we got error messages with passive tracers and checkpoint_intervals as follow:
The command used for passive tracers:
the corresponding error:
The command used for Checkpoint_interval:
Model.run_for(40.0 * u.megayear, checkpoint_interval=1. * u.megayear)
The corresponding error:
Do you have any suggestions?
Thanks, Thamer