pecos / tps

Torch Plasma Simulator
BSD 3-Clause "New" or "Revised" License
8 stars 2 forks source link

GPU spongezone issues detected on Lassen #140

Closed koomie closed 2 years ago

koomie commented 2 years ago

With spongezone enabled, have encountered a situation where the current GPU code fails in certain cases.

Reproducer input file

[solver]
type = flow

[flow]
mesh = cold-flow-spongezone3.c1.msh
order = 1
integrationRule = 0
basisType = 0
maxIters = 50
#maxIters = 2
outputFreq = 500000
useRoe = 0
enableSummationByParts = 0
fluid = dry_air
viscosityMultiplier = 1.
equation_system     = navier-stokes
refLength           = 0.1
timingFreq          = 5

[io]
outdirBase = output
#enableRestart = True
restartMode = singleFileReadWrite
#restartMode = singleFileWrite

[jobManagement]
enableAutoRestart = True
timeThreshold = 1500
checkFreq = 25000

[time]
#cfl = 0.3
enableConstantTimestep = True
cfl = 0.14  # p3
integrator = rk3

[viscosityMultiplierFunction]
isEnabled = False
norm = '-1 0 0'
p0 = '0.44041 0 0'
pInit = '0.40 0 0'
viscosityRatio = 75.

[initialConditions]
rho = 1.2
#rhoU = 0.2697
rhoU = 0.1
rhoV = 0.
rhoW = 0.
pressure = 101300

[boundaryConditions/inlet1]
# +z direction
patch = 1
type = subsonic
density = 1.2
uvw = '0 -55.330087 -48.097711'

[boundaryConditions/inlet2]
# -z direction
patch = 2
type = subsonic
density = 1.2
uvw = '0 55.330087 48.097711'

[boundaryConditions/inlet3]
# +y direction
patch = 3
type = subsonic
density = 1.2
uvw = '0 -48.097711 55.330087'

[boundaryConditions/inlet4]
# -y direction
patch = 4
type = subsonic
density = 1.2
uvw = '0 48.097711 -55.330087'

[boundaryConditions/outlet1]
patch = 6
type = subsonicPressure
pressure = 101300

# torch wall
[boundaryConditions/wall1]
patch = 5
type = viscous_isothermal
temperature = 300

# jet (outer cylinder wall)
[boundaryConditions/wall2]
patch = 7
#type = inviscid
type = viscous_isothermal
temperature = 300

# jet (lower jet exit)
[boundaryConditions/wall3]
patch = 8
#type = viscous_adiabatic
type = viscous_isothermal
temperature = 300

[boundaryConditions]
numWalls = 3
numInlets = 4
numOutlets = 1

[spongezone]
numSpongeZones = 1

[spongezone1]
isEnabled = False
normal = '-1 0 0'
p0 = '0.52 0 0'
pInit = '0.43 0 0'
type = planar
targetSolType = userDef
density = 1.2
uvw = '0.070151 0 0'
pressure = 101300

[gpu]
numGpusPerRank = 4

@trevilo has a copy of the mesh file referenced above. When using the above input, a case using 8 MPI ranks (8 gpu) will fail on Lassen. If you comment out the sponge zone related inputs, it will run fine.

dreamer2368 commented 2 years ago

Is this specific to 8-gpu case, or does it fail on different number of gpus?

koomie commented 2 years ago

It is sensitive to number of gpus. I believe this case runs ok with 4 gpus, but not 16.

trevilo commented 2 years ago

I haven't been able to reproduce this behavior.

I tried with the input file given above using the current HEAD on main (9dd74fa) as well as the version merged on June 9 (7ce3582), when the problem was reported, using both 8 gpus and 16 gpus. All combinations ran to completion (50 steps) without any problem. I have also run other multi-gpu (up to 128) + sponge zone cases recently without incident.

So... I'm going to close this. Of course, will reopen if it pops up again.