Open smartalecH opened 2 years ago
Hi @smartalecH, the issue you see passing a struct
to a @parallel
kernel relates to compliance limitation from CUDA.jl when it comes to passing struct
to GPU kernels (from the Module documentation callable from the Julia REPL, kernels declared using @parallel
accept Data.Array
and Data.Number
arguments, and should contain only stencil computations to be performed with one of the submodules ParallelStencil.FiniteDifferences{1D|2D|3D}
.
We are currently developing proper support for different kind of data types within ParallelStencil, notably for logical arrays of structs (or of small arrays) which can be either laid out in memory as arrays of structs or as structs of arrays accounting for the fact that each of these allocation approaches has its use cases where it performs best.
The following shows, e.g., how an array of small arrays can be created in future using a purely declarative allocation:
julia> using ParallelStencil
julia> @init_parallel_stencil(package=CUDA, ndims=2)
julia> A = @rand(2, 2, numbertype=Float16, celldims=(2,3))
2×2 CUDA.CuArray{StaticArrays.SMatrix{2, 3, Float16, 6}, 2}:
[0.877 0.7266 0.007324; 0.5176 0.3496 0.541] … [0.8716 0.7246 0.4375; 0.662 0.4355 0.2075]
[0.5244 0.2056 0.1436; 0.891 0.0757 0.882] [0.9214 0.5864 0.206; 0.2124 0.1855 0.4756]
julia> A[2,1]
2×3 StaticArrays.SMatrix{2, 3, Float16, 6} with indices SOneTo(2)×SOneTo(3):
0.5244 0.2056 0.1436
0.891 0.0757 0.882
julia> A[2,1][2]
Float16(0.891)
Logical arrays of structs will be allocatable in a similar way. Note that this is not yet merged into the master branch of ParallelStencil, but should be ready within a few weeks. In any case, we plan to present this at JuliaCon 2022.
Meanwhile, you will need to manually make sure to comply with the requirements of CUDA.jl. The following document documentation explains one solution: https://cuda.juliagpu.org/stable/tutorials/custom_structs/. A MWE implementing it in ParallelStencil using @parallel_indices
function declaration can be found in this gist.
Regarding the @within
macro, the REPL-callable doc returns
help?> @within
@within(macroname::String, A::Symbol)
Return an expression that evaluates to true if the indices generated by @parallel (module ParallelStencil) point to elements in bounds
of the selection of A by macroname.
│ Warning
│
│ This macro is not intended for explicit manual usage. Calls to it are automatically added by @parallel where required.
Hope this info helps.
Wow, thank you for the very comprehensive response!
relates to compliance limitation from CUDA.jl
This makes sense, thanks.
We are currently developing proper support for different kind of data types within ParallelStencil
Interesting. Let me know if I can help at all. Happy to contribute.
In any case, we plan to present this at JuliaCon 2022.
Very cool. If you get a chance, please post a link to the corresponding paper (and youtube talk once available).
A MWE implementing it in ParallelStencil using @parallel_indices function declaration can be found in this gist.
Thanks for pulling this together. Very helpful!
Hope this info helps.
Very much so, thanks! I'll keep this issue open then at least until the change you mention is merged into master
.
kernels declared using @parallel accept Data.Array and Data.Number arguments, and should contain only stencil computations to be performed with one of the submodules ParallelStencil.FiniteDifferences{1D|2D|3D}.
On a similar note, suppose I want to initialize a "geometry" array (e.g. if I were to solve the Poisson PDE, ∇⋅c∇u = f and c
describes a piecewise constant "geometry"). The GeometryPrimitives package is great for this, and allows me to describe rather sophisticated geometries.
Typically (in C/C++) I try to initialize such arrays using the same formalism as the code that operates on them, so that they get allocated nearest to the respective core (and minimize any broadcasting and communication between cores or processes). Ideally, I would also use a @parallel
construct to allocate my c
array, but that would require I pass non ParallelStencil
data structures (in this case, a geometry "tree" with geometry objects) to that function (which obviously won't work with the current codebase as stated above).
What do you guys typically do in these situations? Have you tried interfacing directly with external geometry libraries, or do you typically start with a native CPU array and broadcast when needed (as done here)?
Thanks for all of the help!
What do you guys typically do in these situations?
Up to now, our approach was to do most of initialisation using native CPU arrays as you are referring to, broadcasting the final result to Data.Array
that will be actually used in the performance critical parts. Now, some initialisation could also be performed in a @parallel
function, and should be done as such if it becomes a performance bottleneck. But most often one may want to prioritise flexibility over performance for initialisation.
Ultimately, very user is free to pick their preferred workflow, being ultimately able to broadcast the final result to the desired type and backend.
Have you tried interfacing directly with external geometry libraries
Some times, but in the above described fashion.
I have several
@parallel
functions that each have several arguments (of typeData.Array
). To make the code cleaner, I (naively) tried passing a singlestruct
containing all of these arguments. Unfortunately, I get the following error:Fundamentally, I'm assuming this has something to do with how the
struct
lives in memory and isn't compatible with the the wayParallelStencil
distributesData.Array
s. Is there a way around this? Perhaps there's a more "canonical" way to accomplish this?Could you also explain the role of the
@within
macro, so that I can better understand the limitations of@parallel
functions (and how to properly use them)?Thanks for the great package!