underworldcode / underworld2

underworld2: A parallel, particle-in-cell, finite element code for Geodynamics.
http://www.underworldcode.org/
Other
176 stars 59 forks source link

An unidentified bug occurs during mesh deformation at high resolution in parallel in UWGeo. #704

Open gthyagi opened 4 weeks ago

gthyagi commented 4 weeks ago

Hi,

I'm encountering an out-of-RAM issue while performing mesh deformation in UWGeo in parallel. The code works fine with a resolution of 64x64x48 on 128 CPUs, but when I increase the resolution to 128x128x48 with 528 CPUs, I run into the out-of-RAM problem at the Model.mesh.deform_mesh step. Below is the code to replicate the issue.

Thanks for the help.

# In[1]:

from underworld import UWGeodynamics as GEO
import numpy as np
import os

# In[2]:

# output dir
outputPath = './output/Deformed_3D_Mesh/'

if not os.path.exists(outputPath):
    os.makedirs(outputPath)

# In[3]:

# units system
u = GEO.UnitRegistry
model_length = 1024e3 * u.meter
KL = model_length
GEO.scaling_coefficients["[length]"] = KL

# In[4]:

# model parameters
xres = 4 # 128
yres = 4 # 128
zres = 48
xmin, xmax = 0., 1024.
ymin, ymax = 0., 1024.
zmin, zmax = 0., -600.
xnodes, ynodes, znodes = xres+1, yres+1, zres+1

# In[5]:

Model = GEO.Model(elementRes=(xres, yres, zres),
                  minCoord=(xmin * u.kilometer, ymin * u.kilometer, zmax * u.kilometer),
                  maxCoord=(xmax * u.kilometer, ymax * u.kilometer, zmin * u.kilometer),
                  gravity=(0.0, 0.0, -9.81 * u.meter / u.second**2))

# In[6]:

# create refine depth array
depth1 = -40
res_in_depth1 = 5
nodes_in_depth1 = np.int32(np.abs(depth1 - Model.top.m)/res_in_depth1) + 1

depth2 = -120
res_in_depth2 = 10
nodes_in_depth2 = np.int32(np.abs(depth2 - depth1)/res_in_depth2) + 1

if nodes_in_depth1 >= znodes: # safety check
    raise ValueError('Refinement exceeds the number of nodes in z')

depth_arr1 = np.linspace(Model.top.m/xmax, depth1/xmax, nodes_in_depth1, endpoint=True)
depth_arr2 = np.linspace(depth1/xmax, depth2/xmax, nodes_in_depth2, endpoint=True)
depth_arr3 = np.linspace(depth2/xmax, Model.bottom.m/xmax,
                         (znodes-nodes_in_depth1-nodes_in_depth2)+2, endpoint=True)
ref_depth_arr = np.concatenate((depth_arr1, depth_arr2[1:-1], depth_arr3))

# In[7]:

# Create a mesh grid to repeat the z-coordinates across all (x, y) pairs
z_repeated = np.repeat(ref_depth_arr, (xnodes * ynodes) )[::-1]

# In[8]:

# deforming the mesh
if GEO.rank == 0:
    print('starting mesh deformation')

with Model.mesh.deform_mesh():
    Model.mesh.data[:,2] = z_repeated[Model.mesh.data_nodegId]

if GEO.rank == 0:
    print('ending mesh deformation')

# In[9]:

# save mesh and meshvariable for visz
MeshHnd = Model.mesh.save(outputPath+'DeformedMesh_'+str(xres)+str(yres)+str(zres)+'_'+str(int(-zmax))+'.h5')
bknight1 commented 4 weeks ago

You said you also had the same issue with loading the mesh with UWGeo but it worked with UW? Did you try the mesh deform in UW?

gthyagi commented 4 weeks ago

yes, in UW things are working fine. The issue is only in UWGeo.

gthyagi commented 4 weeks ago

These are the UW numbers and everything working fine.

        Global element size: 128x128x48
        Local offset of rank 0: 0x0x0
        Local range of rank 0: 12x11x12
starting mesh deformation
ending mesh deformation

======================================================================================
                  Resource Usage on 2024-10-25 12:34:00:
   Job Id:             127653244.gadi-pbs
   Project:            n69
   Exit Status:        0
   Service Units:      3.23
   NCPUs Requested:    528                    NCPUs Used: 528
                                           CPU Time Used: 00:52:13
   Memory Requested:   2.06TB                Memory Used: 36.97GB
   Walltime requested: 03:00:00            Walltime Used: 00:00:11
   JobFS requested:    1024.0MB               JobFS used: 0B
======================================================================================