omlins / ParallelStencil.jl

Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs
BSD 3-Clause "New" or "Revised" License
312 stars 31 forks source link

Handling boundaries #29

Closed ChrisRackauckas closed 3 years ago

ChrisRackauckas commented 3 years ago

For fun I was putting together some method of lines demos. Here's ParallelStencil.jl used with Runge-Kutta Chebyshev methods:

const USE_GPU = true
using ParallelStencil, OrdinaryDiffEq
using ParallelStencil.FiniteDifferences3D
@static if USE_GPU
    @init_parallel_stencil(CUDA, Float64, 3);
else
    @init_parallel_stencil(Threads, Float64, 3);
end

@parallel function diffusion3D_step!(T2, T, Ci, lam, dx, dy, dz)
    @inn(T2) = lam*@inn(Ci)*(@d2_xi(T)/dx^2 + @d2_yi(T)/dy^2 + @d2_zi(T)/dz^2);
    return
end

function diffusion3D(alg)
    # Physics
    lam        = 1.0;                                        # Thermal conductivity
    cp_min     = 1.0;                                        # Minimal heat capacity
    lx, ly, lz = 10.0, 10.0, 10.0;                           # Length of domain in dimensions x, y and z.

    # Numerics
    nx, ny, nz = 256, 256, 256;                              # Number of gridpoints dimensions x, y and z.
    nt         = 100;                                        # Number of time steps
    dx         = lx/(nx-1);                                  # Space step in x-dimension
    dy         = ly/(ny-1);                                  # Space step in y-dimension
    dz         = lz/(nz-1);                                  # Space step in z-dimension

    # Array initializations
    T   = @zeros(nx, ny, nz);
    T2  = @zeros(nx, ny, nz);
    Ci  = @zeros(nx, ny, nz);

    # Initial conditions (heat capacity and temperature with two Gaussian anomalies each)
    Ci .= 1.0./( cp_min .+ Data.Array([5*exp(-(((ix-1)*dx-lx/1.5))^2-(((iy-1)*dy-ly/2))^2-(((iz-1)*dz-lz/1.5))^2) +
                                       5*exp(-(((ix-1)*dx-lx/3.0))^2-(((iy-1)*dy-ly/2))^2-(((iz-1)*dz-lz/1.5))^2) for ix=1:size(T,1), iy=1:size(T,2), iz=1:size(T,3)]) )
    T  .= Data.Array([100*exp(-(((ix-1)*dx-lx/2)/2)^2-(((iy-1)*dy-ly/2)/2)^2-(((iz-1)*dz-lz/3.0)/2)^2) +
                       50*exp(-(((ix-1)*dx-lx/2)/2)^2-(((iy-1)*dy-ly/2)/2)^2-(((iz-1)*dz-lz/1.5)/2)^2) for ix=1:size(T,1), iy=1:size(T,2), iz=1:size(T,3)])
    T2 .= T;                                                 # Assign also T2 to get correct boundary conditions.
    function f(du,u,p,t)
        @show t
        @parallel diffusion3D_step!(du, u, Ci, lam, dx, dy, dz);
    end
    prob = ODEProblem(f, T, (0.0,1.0))
    sol = solve(prob, alg, save_everystep = false, save_start = false)
end

sol = diffusion3D(ROCK2())
using Plots
heatmap(Array(sol[end]))

However, I couldn't figure out the API for specifying BCs. Is there a way to specifically mention the boundary stencil or do you have to just write a loop after the interior calculation?

luraess commented 3 years ago

Thanks @ChrisRackauckas for reporting and happy to see you could combine ParallelStencil to OrdinaryDiffEq! Regarding the BCs, there is indeed no specific API available in ParallelStencil. The current example assumes implicitly that T on the boundaries is kept to its initial value, i.e. 0. Imposing Dirichlet BCs in x-direction and Neumann (no flux) in y-direction, leaving the z-direction to 0, one could write

@parallel diffusion3D_step!(du, u, Ci, lam, dx, dy, dz)
T[1  ,:  ,:  ] .= 4 # Dirichlet with value = 4
T[end,:  ,:  ] .= 3 # Dirichlet with value = 3
T[:  ,end,:  ] .= T[:,end-1,:] # Dirichlet
T[:  ,1  ,:  ] .= T[:,2,:] # Neumann no flux

or combine them in a more compact way (and eventually put them into kernels - currently needed for to enable @hide_communication to work with multi-GPUs).

-- EDIT -- When performance is relevant, BCs should be wrapped into a @parallel_indices call:

@parallel_indices (iy,iz) function bc_x!(A::Data.Array)
    A[1  , iy, iz] = 4
    A[end, iy, iz] = 3
    return
end

@parallel_indices (ix,iz) function bc_y!(A::Data.Array)
    A[ix, 1  , iz] = A[ix, 2    , iz]
    A[ix, end, iz] = A[ix, end-1, iz]
    return
end

and called with the according range:

@parallel diffusion3D_step!(du, u, Ci, lam, dx, dy, dz)
@parallel (1:size(du,2), 1:size(du,3)) bc_x!(du)
@parallel (1:size(du,1), 1:size(du,3)) bc_y!(du)

as in e.g. this 3D example.

ChrisRackauckas commented 3 years ago

Okay cool, thought that might be the case.

luraess commented 3 years ago

My example was 2D while yours 3D - see EDITs, also for the kernel version.

ChrisRackauckas commented 3 years ago

Ahh yes parallel_indices is the kind of primitive I was looking for, thanks.