Open ahojukka5 opened 1 month ago
Without the stack trace, I'm guessing the error comes from this line:
in the PencilArrays.jl package. The easy solution would be to remove the own = false
, which is not supported by ROCArray
and is the default anyway for other arrays types. The problem is that, reading the AMDGPU docs, one would also need to pass the lock = false
keyword, since we want to wrap an array that is already in the GPU. I guess this could be solved by using a package extension explicitly taking care of the ROCArray
case.
I did a simple modification:
function unsafe_as_array(::Type{T}, x::AbstractVector{UInt8}, dims) where {T}
p = typeof_ptr(x){T}(pointer(x))
unsafe_wrap(typeof_array(x), p, dims, lock=false)
end
At least the error message is now different, here's the stack trace:
ERROR: LoadError: AssertionError: Base.mightalias(u_prev, u)
Stacktrace:
[1] _apply_plans_in_place!(dir::Val{-1}, full_plan::PencilFFTPlan{ComplexF64, 3, true, 3, 2, 0, PencilFFTs.GlobalFFTParams{Float64, 3, true, Tuple{PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!}}, Tuple{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}}, PencilArrays.Transpositions.PointToPoint, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, u_prev::PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, pair::Pair{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, next_pairs::Pair{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}})
@ PencilFFTs ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:281
[2] _apply_plans_in_place!(::Val{-1}, ::PencilFFTPlan{ComplexF64, 3, true, 3, 2, 0, PencilFFTs.GlobalFFTParams{Float64, 3, true, Tuple{PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!}}, Tuple{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}}, PencilArrays.Transpositions.PointToPoint, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, ::Nothing, ::Pair{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, ::Pair{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, ::Vararg{Any})
@ PencilFFTs ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:291
[3] _apply_plans!
@ ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:184 [inlined]
[4] macro expansion
@ ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:29 [inlined]
[5] macro expansion
@ ~/.julia/packages/TimerOutputs/Lw5SP/src/TimerOutput.jl:253 [inlined]
[6] mul!(dst::ManyPencilArray{ComplexF64, 3, 3, Tuple{PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, ROCArray{ComplexF64, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, p::PencilFFTPlan{ComplexF64, 3, true, 3, 2, 0, PencilFFTs.GlobalFFTParams{Float64, 3, true, Tuple{PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!}}, Tuple{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}}, PencilArrays.Transpositions.PointToPoint, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, src::ManyPencilArray{ComplexF64, 3, 3, Tuple{PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, ROCArray{ComplexF64, 1, AMDGPU.Runtime.Mem.HIPBuffer}})
@ PencilFFTs ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:27
[7] *(p::PencilFFTPlan{ComplexF64, 3, true, 3, 2, 0, PencilFFTs.GlobalFFTParams{Float64, 3, true, Tuple{PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!}}, Tuple{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}}, PencilArrays.Transpositions.PointToPoint, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, src::ManyPencilArray{ComplexF64, 3, 3, Tuple{PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, ROCArray{ComplexF64, 1, AMDGPU.Runtime.Mem.HIPBuffer}})
@ PencilFFTs ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:56
[8] top-level scope
@ /pfs/lustrep2/users/juaho/dev/julia-pencil-rocm/rocm.jl:24
in expression starting at /pfs/lustrep2/users/juaho/dev/julia-pencil-rocm/rocm.jl:24
srun: error: nid005115: task 0: Exited with exit code 1
srun: launch/slurm: _step_signal: Terminating StepId=7391317.0
And this error comes from multiplication plan*u
. Does this give any idea how this could be fixed?
Thanks for testing. So the mightalias
assertion which failed serves to check that the two arrays point to the same data, which should be the case here since you're doing an in-place transform.
More precisely, for in-place transforms, we reuse a single Array
/ROCArray
/... buffer at the different stages of an FFT, during which the domain decomposition (the Pencil
) changes. So we create multiple PencilArray
s wrapping views of the same array. This is the ManyPencilArray
type defined in PencilArrays.jl. One should expect these PencilArray
s to be aliased to each other, i.e. they share the same data, so calling Base.mightalias(u, v)
should return true
.
One reason for this failure would be that AMDGPU.jl is missing a definition of Base.dataids(::ROCArray)
, which is used by mightalias
. If that's the case, then that should be corrected in AMDGPU.jl.
You can also try the following script (replacing Array
with ROCArray
), which is basically what is done in the ManyPencilArray
type:
data = Array{Float64}(undef, 200)
u = reshape(view(data, 1:100), 20, 5)
v = reshape(view(data, 1:40), 10, 4)
Base.mightalias(u, v) # true
Base.dataids(u) == Base.dataids(v) # true
If the last line gives false
, it's likely because AMDGPU.jl is missing a dataids
definition.
You might also get away with removing the failing @assert
, but it would be nice for things to work with ROCArray
out of the box.
I used test script:
using AMDGPU
function test(SomeArray)
data = SomeArray{Float64}(undef, 200)
u = reshape(view(data, 1:100), 20, 5)
v = reshape(view(data, 1:40), 10, 4)
@show Base.mightalias(u, v)
@show Base.dataids(u) == Base.dataids(v)
end
test(Array)
test(ROCArray)
Results:
Base.mightalias(u, v) = true
Base.dataids(u) == Base.dataids(v) = true
Base.mightalias(u, v) = false
Base.dataids(u) == Base.dataids(v) = false
Yes, it seems that AMDGPU.jl is missing the definition found from [CUDA.jl}(https://github.com/JuliaGPU/CUDA.jl/blob/e1e5be2b6bf17f03a367cebeb18c4645e593f80d/src/array.jl#L99):
Base.dataids(A::CuArray) = (UInt(pointer(A)),)
A modified test script
using AMDGPU
Base.dataids(A::ROCArray) = (UInt(pointer(A)),)
function test(SomeArray)
data = SomeArray{Float64}(undef, 200)
u = reshape(view(data, 1:100), 20, 5)
v = reshape(view(data, 1:40), 10, 4)
@show Base.mightalias(u, v)
@show Base.dataids(u) == Base.dataids(v)
end
test(Array)
test(ROCArray)
Will now show true, but unfortunately using plan is still not working:
ERROR: LoadError: MethodError: no method matching unsafe_wrap(::Type{ROCArray}, ::Ptr{ComplexF64}, ::Int64; lock::Bool)
Closest candidates are:
unsafe_wrap(::Type{<:ROCArray}, ::Ptr{T}, !Matched::Tuple{Vararg{var"#s1154", N}} where var"#s1154"<:Integer; lock) where {T, N}
@ AMDGPU ~/.julia/packages/AMDGPU/gtxsf/src/array.jl:191
unsafe_wrap(!Matched::Type{ROCArray{T}}, ::Ptr, ::Any; kwargs...) where T
@ AMDGPU ~/.julia/packages/AMDGPU/gtxsf/src/array.jl:203
unsafe_wrap(!Matched::Union{Type{Array}, Type{Array{T}}, Type{Vector{T}}}, ::Ptr{T}, ::Integer; own) where T got unsupported keyword argument "lock"
@ Base pointer.jl:90
...
Stacktrace:
[1] unsafe_as_array(::Type{ComplexF64}, x::ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}, dims::Int64)
@ PencilArrays.Transpositions ~/.julia/dev/PencilArrays/src/Transpositions/Transpositions.jl:209
[2] transpose_impl!(R::Int64, t::PencilArrays.Transpositions.Transposition{ComplexF64, 3, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArrays.Transpositions.PointToPoint})
@ PencilArrays.Transpositions ~/.julia/dev/PencilArrays/src/Transpositions/Transpositions.jl:314
[3] macro expansion
@ ~/.julia/dev/PencilArrays/src/Transpositions/Transpositions.jl:173 [inlined]
[4] macro expansion
@ ~/.julia/packages/TimerOutputs/Lw5SP/src/TimerOutput.jl:253 [inlined]
[5] transpose!(t::PencilArrays.Transpositions.Transposition{ComplexF64, 3, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArrays.Transpositions.PointToPoint}; waitall::Bool)
@ PencilArrays.Transpositions ~/.julia/dev/PencilArrays/src/Transpositions/Transpositions.jl:172
[6] transpose!
@ ~/.julia/dev/PencilArrays/src/Transpositions/Transpositions.jl:170 [inlined]
[7] _apply_plans_in_place!(dir::Val{-1}, full_plan::PencilFFTPlan{ComplexF64, 3, true, 3, 2, 0, PencilFFTs.GlobalFFTParams{Float64, 3, true, Tuple{PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!}}, Tuple{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}}, PencilArrays.Transpositions.PointToPoint, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, u_prev::PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, pair::Pair{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, next_pairs::Pair{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}})
@ PencilFFTs ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:284
[8] _apply_plans_in_place!(::Val{-1}, ::PencilFFTPlan{ComplexF64, 3, true, 3, 2, 0, PencilFFTs.GlobalFFTParams{Float64, 3, true, Tuple{PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!}}, Tuple{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}}, PencilArrays.Transpositions.PointToPoint, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, ::Nothing, ::Pair{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, ::Pair{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, ::Vararg{Any})
@ PencilFFTs ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:291
[9] _apply_plans!
@ ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:184 [inlined]
[10] macro expansion
@ ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:29 [inlined]
[11] macro expansion
@ ~/.julia/packages/TimerOutputs/Lw5SP/src/TimerOutput.jl:253 [inlined]
[12] mul!(dst::ManyPencilArray{ComplexF64, 3, 3, Tuple{PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, ROCArray{ComplexF64, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, p::PencilFFTPlan{ComplexF64, 3, true, 3, 2, 0, PencilFFTs.GlobalFFTParams{Float64, 3, true, Tuple{PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!}}, Tuple{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}}, PencilArrays.Transpositions.PointToPoint, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, src::ManyPencilArray{ComplexF64, 3, 3, Tuple{PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, ROCArray{ComplexF64, 1, AMDGPU.Runtime.Mem.HIPBuffer}})
@ PencilFFTs ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:27
[13] *(p::PencilFFTPlan{ComplexF64, 3, true, 3, 2, 0, PencilFFTs.GlobalFFTParams{Float64, 3, true, Tuple{PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!}}, Tuple{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}}, PencilArrays.Transpositions.PointToPoint, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, src::ManyPencilArray{ComplexF64, 3, 3, Tuple{PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, ROCArray{ComplexF64, 1, AMDGPU.Runtime.Mem.HIPBuffer}})
@ PencilFFTs ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:56
[14] top-level scope
@ /pfs/lustrep2/users/juaho/dev/julia-pencil-rocm/rocm.jl:26
in expression starting at /pfs/lustrep2/users/juaho/dev/julia-pencil-rocm/rocm.jl:26
Again, line 26 being plan*u
.
Thanks, it looks like the dims
argument of unsafe_wrap
must be a Tuple
of integers for ROCArray
s.
Could you try replacing your first modification to unsafe_as_array
with the following two definitions?
function unsafe_as_array(::Type{T}, x::ROCVector{UInt8}, dims::Tuple) where {T}
p = typeof_ptr(x){T}(pointer(x))
unsafe_wrap(typeof_array(x), p, dims, lock=false)
end
unsafe_as_array(::Type{T}, x::ROCVector{UInt8}, N::Integer) where {T} = unsafe_as_array(T, x, (N,))
# Reinterpret UInt8 vector as a different type of array.
# The input array should have enough space for the reinterpreted array with the
# given dimensions.
# This is a workaround to the performance issues when using `reinterpret`.
# See for instance:
# - https://discourse.julialang.org/t/big-overhead-with-the-new-lazy-reshape-reinterpret/7635
# - https://github.com/JuliaLang/julia/issues/28980
#=
function unsafe_as_array(::Type{T}, x::AbstractVector{UInt8}, dims) where {T}
p = typeof_ptr(x){T}(pointer(x))
unsafe_wrap(typeof_array(x), p, dims, lock=false)
end
=#
using AMDGPU
function unsafe_as_array(::Type{T}, x::ROCVector{UInt8}, dims::Tuple) where {T}
p = typeof_ptr(x){T}(pointer(x))
unsafe_wrap(typeof_array(x), p, dims, lock=false)
end
unsafe_as_array(::Type{T}, x::ROCVector{UInt8}, N::Integer) where {T} = unsafe_as_array(T, x, (N,))
And the modification
Base.dataids(A::ROCArray) = (UInt(pointer(A)),)
Now:
rank:0GPU:1
has-cuda:false
data size:(1024, 32, 32)
Start data allocationg
BenchmarkTools.Trial: 100 samples with 1 evaluation.
Range (min … max): 213.333 μs … 508.765 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 220.537 μs ┊ GC (median): 0.00%
Time (mean ± σ): 290.258 μs ± 108.539 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
▆█▃ ▁
███▅▁▅▅▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▇▇▅▆▁▁▁▅▅█▇▅▇▁▁▁▆▇ ▅
213 μs Histogram: log(frequency) by time 498 μs <
Memory estimate: 30.39 KiB, allocs estimate: 364.
So these changes are enough to make things work.
Awesome!
I think it would make sense to contribute the dataids
definition to the AMDGPU.jl package, since otherwise we're committing type piracy which is not very cool. I'll probably submit a PR myself. I see you've already done it, great :)
As for the unsafe_wrap
issue, I'll try to fix it using a package extension dealing with the specific ROCArray
case (but unfortunately I can't test it), so that it works out of the box in the future.
There's a version conflict preventing me trying this
ERROR: Unsatisfiable requirements detected for package MPI [da04e1cc]:
MPI [da04e1cc] log:
├─possible versions are: 0.7.0-0.20.19 or uninstalled
├─restricted to versions * by RocmTest [17e98292], leaving only versions: 0.7.0-0.20.19
│ └─RocmTest [17e98292] log:
│ ├─possible versions are: 0.1.0 or uninstalled
│ └─RocmTest [17e98292] is fixed to version 0.1.0
├─restricted to versions 0.20 by PencilFFTs [4a48f351], leaving only versions: 0.20.0-0.20.19
│ └─PencilFFTs [4a48f351] log:
│ ├─possible versions are: 0.15.1 or uninstalled
│ └─PencilFFTs [4a48f351] is fixed to version 0.15.1
├─restricted by compatibility requirements with AMDGPU [21141c5a] to versions: 0.7.0-0.20.8 or uninstalled, leaving only versions: 0.20.0-0.20.8
│ └─AMDGPU [21141c5a] log:
│ ├─possible versions are: 0.9.5 or uninstalled
│ └─AMDGPU [21141c5a] is fixed to version 0.9.5
└─restricted by compatibility requirements with PencilArrays [0e08944d] to versions: 0.20.16-0.20.19 — no versions left
└─PencilArrays [0e08944d] log:
├─possible versions are: 0.1.0-0.19.5 or uninstalled
├─restricted to versions 0.18-0.19 by PencilFFTs [4a48f351], leaving only versions: 0.18.0-0.19.5
│ └─PencilFFTs [4a48f351] log: see above
└─restricted by compatibility requirements with Adapt [79e6a3ab] to versions: [0.1.0-0.16.0, 0.19.3-0.19.5] or uninstalled, leaving only versions: 0.19.3-0.19.5
└─Adapt [79e6a3ab] log:
├─possible versions are: 0.3.0-4.0.4 or uninstalled
└─restricted to versions 4 by AMDGPU [21141c5a], leaving only versions: 4.0.0-4.0.4
└─AMDGPU [21141c5a] log: see above
Any tips on where should I start to fix this...?