omlins / ParallelStencil.jl

Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs
BSD 3-Clause "New" or "Revised" License
312 stars 31 forks source link

Error copying Cuda Array to Array when using GPU #51

Closed Eure-L closed 2 years ago

Eure-L commented 2 years ago

Hi! First I'll thank you all for this amazing module that I just discovered and enjoy so much. I am very new to Julia and started to play around with some easy GPU computing that offers parallel_stencil.jl, but came across error when running one of the examples provided (examples/diffusion3D_multigpucpu_hidecomm.jl):

    T_nohalo .= T[2:end-1,2:end-1,2:end-1];                                           # Copy data to CPU removing the halo.

where T would be a CUDA Array (selected by parallel_stencil) and T_nohallo a "standard" Array

Causing the following Error :

ERROR: LoadError: This object is not a GPU array Stacktrace: [1] error(s::String) @ Base ./error.jl:33 [2] backend(#unused#::Type) @ GPUArrays ~/.julia/packages/GPUArrays/VNhDf/src/device/execution.jl:15 [3] backend(x::Array{Float64, 3}) @ GPUArrays ~/.julia/packages/GPUArrays/VNhDf/src/device/execution.jl:16 [4] _copyto! @ ~/.julia/packages/GPUArrays/VNhDf/src/host/broadcast.jl:73 [inlined] [5] materialize! @ ~/.julia/packages/GPUArrays/VNhDf/src/host/broadcast.jl:51 [inlined] [6] materialize!(dest::Array{Float64, 3}, bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{3}, Nothing, typeof(identity), Tuple{CuArray{Float64, 3, CUDA.Mem.DeviceBuffer}}}) @ Base.Broadcast ./broadcast.jl:868 [7] diffusion3D() @ Main /.../diffusion3D_multigpucpu_hidecomm.j:60 [8] top-level scope @ /.../diffusion3D_multigpucpu_hidecomm.jl:84 in expression starting at /.../diffusion3D_multigpucpu_hidecomm.j:84

So from my basic understanding it would appear that parallel_stencil doesn't allow interoperability between CUDA Arrays and Standard ones for broadcasting, is it no longer supported? Sorry in advance if this is a dumb issue, I have yet to find a workaround.

luraess commented 2 years ago

Thanks for pointing this out @Eure-L . Indeed, some recent updates in CUDA.jl require now to explicitly convert the GPU array for broadcasting. This will work:

T_nohalo .= Array(T[2:end-1,2:end-1,2:end-1]);

Thanks for reporting, and we will update the examples for the next release.