jipolanco / PencilFFTs.jl

Fast Fourier transforms of MPI-distributed Julia arrays
https://jipolanco.github.io/PencilFFTs.jl/dev/
MIT License
77 stars 7 forks source link

Unable to create PencilFFTPlan using ROCArray #74

Open ahojukka5 opened 1 month ago

ahojukka5 commented 1 month ago
ERROR: LoadError: MethodError: no method matching unsafe_wrap(::Type{ROCArray}, ::Ptr{ComplexF64}, ::Tuple{Int64, Int64, Int64}; own::Bool)

Closest candidates are:
  unsafe_wrap(::Type{<:ROCArray}, ::Ptr{T}, ::Tuple{Vararg{var"#s1154", N}} where var"#s1154"<:Integer; lock) where {T, N} got unsupported keyword argument "own"
   @ AMDGPU ~/.julia/packages/AMDGPU/gtxsf/src/array.jl:191

Any tips on where should I start to fix this...?

jipolanco commented 1 month ago

Without the stack trace, I'm guessing the error comes from this line:

https://github.com/jipolanco/PencilArrays.jl/blob/ea2975f0fd1eec0d9d90dba8ac241ec97a71024b/src/Transpositions/Transpositions.jl#L209

in the PencilArrays.jl package. The easy solution would be to remove the own = false, which is not supported by ROCArray and is the default anyway for other arrays types. The problem is that, reading the AMDGPU docs, one would also need to pass the lock = false keyword, since we want to wrap an array that is already in the GPU. I guess this could be solved by using a package extension explicitly taking care of the ROCArray case.

ahojukka5 commented 1 month ago

I did a simple modification:

function unsafe_as_array(::Type{T}, x::AbstractVector{UInt8}, dims) where {T}
    p = typeof_ptr(x){T}(pointer(x))
    unsafe_wrap(typeof_array(x), p, dims, lock=false)
end

At least the error message is now different, here's the stack trace:

ERROR: LoadError: AssertionError: Base.mightalias(u_prev, u)
Stacktrace:
 [1] _apply_plans_in_place!(dir::Val{-1}, full_plan::PencilFFTPlan{ComplexF64, 3, true, 3, 2, 0, PencilFFTs.GlobalFFTParams{Float64, 3, true, Tuple{PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!}}, Tuple{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}}, PencilArrays.Transpositions.PointToPoint, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, u_prev::PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, pair::Pair{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, next_pairs::Pair{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}})
   @ PencilFFTs ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:281
 [2] _apply_plans_in_place!(::Val{-1}, ::PencilFFTPlan{ComplexF64, 3, true, 3, 2, 0, PencilFFTs.GlobalFFTParams{Float64, 3, true, Tuple{PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!}}, Tuple{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}}, PencilArrays.Transpositions.PointToPoint, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, ::Nothing, ::Pair{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, ::Pair{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, ::Vararg{Any})
   @ PencilFFTs ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:291
 [3] _apply_plans!
   @ ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:184 [inlined]
 [4] macro expansion
   @ ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:29 [inlined]
 [5] macro expansion
   @ ~/.julia/packages/TimerOutputs/Lw5SP/src/TimerOutput.jl:253 [inlined]
 [6] mul!(dst::ManyPencilArray{ComplexF64, 3, 3, Tuple{PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, ROCArray{ComplexF64, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, p::PencilFFTPlan{ComplexF64, 3, true, 3, 2, 0, PencilFFTs.GlobalFFTParams{Float64, 3, true, Tuple{PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!}}, Tuple{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}}, PencilArrays.Transpositions.PointToPoint, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, src::ManyPencilArray{ComplexF64, 3, 3, Tuple{PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, ROCArray{ComplexF64, 1, AMDGPU.Runtime.Mem.HIPBuffer}})
   @ PencilFFTs ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:27
 [7] *(p::PencilFFTPlan{ComplexF64, 3, true, 3, 2, 0, PencilFFTs.GlobalFFTParams{Float64, 3, true, Tuple{PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!}}, Tuple{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}}, PencilArrays.Transpositions.PointToPoint, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, src::ManyPencilArray{ComplexF64, 3, 3, Tuple{PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, ROCArray{ComplexF64, 1, AMDGPU.Runtime.Mem.HIPBuffer}})
   @ PencilFFTs ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:56
 [8] top-level scope
   @ /pfs/lustrep2/users/juaho/dev/julia-pencil-rocm/rocm.jl:24
in expression starting at /pfs/lustrep2/users/juaho/dev/julia-pencil-rocm/rocm.jl:24
srun: error: nid005115: task 0: Exited with exit code 1
srun: launch/slurm: _step_signal: Terminating StepId=7391317.0

And this error comes from multiplication plan*u. Does this give any idea how this could be fixed?

jipolanco commented 1 month ago

Thanks for testing. So the mightalias assertion which failed serves to check that the two arrays point to the same data, which should be the case here since you're doing an in-place transform.

More precisely, for in-place transforms, we reuse a single Array/ROCArray/... buffer at the different stages of an FFT, during which the domain decomposition (the Pencil) changes. So we create multiple PencilArrays wrapping views of the same array. This is the ManyPencilArray type defined in PencilArrays.jl. One should expect these PencilArrays to be aliased to each other, i.e. they share the same data, so calling Base.mightalias(u, v) should return true.

One reason for this failure would be that AMDGPU.jl is missing a definition of Base.dataids(::ROCArray), which is used by mightalias. If that's the case, then that should be corrected in AMDGPU.jl.

You can also try the following script (replacing Array with ROCArray), which is basically what is done in the ManyPencilArray type:

data = Array{Float64}(undef, 200)

u = reshape(view(data, 1:100), 20, 5)
v = reshape(view(data, 1:40), 10, 4)

Base.mightalias(u, v)  # true
Base.dataids(u) == Base.dataids(v)  # true

If the last line gives false, it's likely because AMDGPU.jl is missing a dataids definition.

You might also get away with removing the failing @assert, but it would be nice for things to work with ROCArray out of the box.

ahojukka5 commented 1 month ago

I used test script:

using AMDGPU

function test(SomeArray)
    data = SomeArray{Float64}(undef, 200)
    u = reshape(view(data, 1:100), 20, 5)
    v = reshape(view(data, 1:40), 10, 4)
    @show Base.mightalias(u, v)
    @show Base.dataids(u) == Base.dataids(v)
end

test(Array)
test(ROCArray)

Results:

Base.mightalias(u, v) = true
Base.dataids(u) == Base.dataids(v) = true
Base.mightalias(u, v) = false
Base.dataids(u) == Base.dataids(v) = false

Yes, it seems that AMDGPU.jl is missing the definition found from [CUDA.jl}(https://github.com/JuliaGPU/CUDA.jl/blob/e1e5be2b6bf17f03a367cebeb18c4645e593f80d/src/array.jl#L99):

Base.dataids(A::CuArray) = (UInt(pointer(A)),)

A modified test script

using AMDGPU

Base.dataids(A::ROCArray) = (UInt(pointer(A)),)

function test(SomeArray)
    data = SomeArray{Float64}(undef, 200)
    u = reshape(view(data, 1:100), 20, 5)
    v = reshape(view(data, 1:40), 10, 4)
    @show Base.mightalias(u, v)
    @show Base.dataids(u) == Base.dataids(v)
end

test(Array)
test(ROCArray)

Will now show true, but unfortunately using plan is still not working:

ERROR: LoadError: MethodError: no method matching unsafe_wrap(::Type{ROCArray}, ::Ptr{ComplexF64}, ::Int64; lock::Bool)

Closest candidates are:
  unsafe_wrap(::Type{<:ROCArray}, ::Ptr{T}, !Matched::Tuple{Vararg{var"#s1154", N}} where var"#s1154"<:Integer; lock) where {T, N}
   @ AMDGPU ~/.julia/packages/AMDGPU/gtxsf/src/array.jl:191
  unsafe_wrap(!Matched::Type{ROCArray{T}}, ::Ptr, ::Any; kwargs...) where T
   @ AMDGPU ~/.julia/packages/AMDGPU/gtxsf/src/array.jl:203
  unsafe_wrap(!Matched::Union{Type{Array}, Type{Array{T}}, Type{Vector{T}}}, ::Ptr{T}, ::Integer; own) where T got unsupported keyword argument "lock"
   @ Base pointer.jl:90
  ...

Stacktrace:
  [1] unsafe_as_array(::Type{ComplexF64}, x::ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}, dims::Int64)
    @ PencilArrays.Transpositions ~/.julia/dev/PencilArrays/src/Transpositions/Transpositions.jl:209
  [2] transpose_impl!(R::Int64, t::PencilArrays.Transpositions.Transposition{ComplexF64, 3, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArrays.Transpositions.PointToPoint})
    @ PencilArrays.Transpositions ~/.julia/dev/PencilArrays/src/Transpositions/Transpositions.jl:314
  [3] macro expansion
    @ ~/.julia/dev/PencilArrays/src/Transpositions/Transpositions.jl:173 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/TimerOutputs/Lw5SP/src/TimerOutput.jl:253 [inlined]
  [5] transpose!(t::PencilArrays.Transpositions.Transposition{ComplexF64, 3, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArrays.Transpositions.PointToPoint}; waitall::Bool)
    @ PencilArrays.Transpositions ~/.julia/dev/PencilArrays/src/Transpositions/Transpositions.jl:172
  [6] transpose!
    @ ~/.julia/dev/PencilArrays/src/Transpositions/Transpositions.jl:170 [inlined]
  [7] _apply_plans_in_place!(dir::Val{-1}, full_plan::PencilFFTPlan{ComplexF64, 3, true, 3, 2, 0, PencilFFTs.GlobalFFTParams{Float64, 3, true, Tuple{PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!}}, Tuple{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}}, PencilArrays.Transpositions.PointToPoint, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, u_prev::PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, pair::Pair{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, next_pairs::Pair{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}})
    @ PencilFFTs ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:284
  [8] _apply_plans_in_place!(::Val{-1}, ::PencilFFTPlan{ComplexF64, 3, true, 3, 2, 0, PencilFFTs.GlobalFFTParams{Float64, 3, true, Tuple{PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!}}, Tuple{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}}, PencilArrays.Transpositions.PointToPoint, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, ::Nothing, ::Pair{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, ::Pair{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, ::Vararg{Any})
    @ PencilFFTs ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:291
  [9] _apply_plans!
    @ ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:184 [inlined]
 [10] macro expansion
    @ ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:29 [inlined]
 [11] macro expansion
    @ ~/.julia/packages/TimerOutputs/Lw5SP/src/TimerOutput.jl:253 [inlined]
 [12] mul!(dst::ManyPencilArray{ComplexF64, 3, 3, Tuple{PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, ROCArray{ComplexF64, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, p::PencilFFTPlan{ComplexF64, 3, true, 3, 2, 0, PencilFFTs.GlobalFFTParams{Float64, 3, true, Tuple{PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!}}, Tuple{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}}, PencilArrays.Transpositions.PointToPoint, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, src::ManyPencilArray{ComplexF64, 3, 3, Tuple{PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, ROCArray{ComplexF64, 1, AMDGPU.Runtime.Mem.HIPBuffer}})
    @ PencilFFTs ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:27
 [13] *(p::PencilFFTPlan{ComplexF64, 3, true, 3, 2, 0, PencilFFTs.GlobalFFTParams{Float64, 3, true, Tuple{PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!, PencilFFTs.Transforms.FFT!}}, Tuple{PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}, PencilFFTs.PencilPlan1D{ComplexF64, ComplexF64, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, PencilFFTs.Transforms.FFT!, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, true, true, 3}, AMDGPU.rocFFT.cROCFFTPlan{ComplexF64, false, true, 3}}}, PencilArrays.Transpositions.PointToPoint, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}, src::ManyPencilArray{ComplexF64, 3, 3, Tuple{PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, NoPermutation, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(2, 1, 3), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}, PencilArray{ComplexF64, 3, ROCArray{ComplexF64, 3, AMDGPU.Runtime.Mem.HIPBuffer}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}, ROCArray{UInt8, 1, AMDGPU.Runtime.Mem.HIPBuffer}}}}, ROCArray{ComplexF64, 1, AMDGPU.Runtime.Mem.HIPBuffer}})
    @ PencilFFTs ~/.julia/packages/PencilFFTs/7eqXu/src/operations.jl:56
 [14] top-level scope
    @ /pfs/lustrep2/users/juaho/dev/julia-pencil-rocm/rocm.jl:26
in expression starting at /pfs/lustrep2/users/juaho/dev/julia-pencil-rocm/rocm.jl:26

Again, line 26 being plan*u.

jipolanco commented 1 month ago

Thanks, it looks like the dims argument of unsafe_wrap must be a Tuple of integers for ROCArrays.

Could you try replacing your first modification to unsafe_as_array with the following two definitions?

function unsafe_as_array(::Type{T}, x::ROCVector{UInt8}, dims::Tuple) where {T}
    p = typeof_ptr(x){T}(pointer(x))
    unsafe_wrap(typeof_array(x), p, dims, lock=false)
end

unsafe_as_array(::Type{T}, x::ROCVector{UInt8}, N::Integer) where {T} = unsafe_as_array(T, x, (N,))
ahojukka5 commented 1 month ago
# Reinterpret UInt8 vector as a different type of array.
# The input array should have enough space for the reinterpreted array with the
# given dimensions.
# This is a workaround to the performance issues when using `reinterpret`.
# See for instance:
# - https://discourse.julialang.org/t/big-overhead-with-the-new-lazy-reshape-reinterpret/7635
# - https://github.com/JuliaLang/julia/issues/28980
#=
function unsafe_as_array(::Type{T}, x::AbstractVector{UInt8}, dims) where {T}
    p = typeof_ptr(x){T}(pointer(x))
    unsafe_wrap(typeof_array(x), p, dims, lock=false)
end
=#

using AMDGPU

function unsafe_as_array(::Type{T}, x::ROCVector{UInt8}, dims::Tuple) where {T}
    p = typeof_ptr(x){T}(pointer(x))
    unsafe_wrap(typeof_array(x), p, dims, lock=false)
end

unsafe_as_array(::Type{T}, x::ROCVector{UInt8}, N::Integer) where {T} = unsafe_as_array(T, x, (N,))

And the modification

Base.dataids(A::ROCArray) = (UInt(pointer(A)),)

Now:

rank:0GPU:1
has-cuda:false
data size:(1024, 32, 32)
Start data allocationg
BenchmarkTools.Trial: 100 samples with 1 evaluation.
 Range (min … max):  213.333 μs … 508.765 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     220.537 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   290.258 μs ± 108.539 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▆█▃                                                 ▁          
  ███▅▁▅▅▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▅▁▁▁▁▁▁▁▁▁▁▁▁▁▇▇▅▆▁▁▁▅▅█▇▅▇▁▁▁▆▇ ▅
  213 μs        Histogram: log(frequency) by time        498 μs <

 Memory estimate: 30.39 KiB, allocs estimate: 364.

So these changes are enough to make things work.

jipolanco commented 1 month ago

Awesome!

I think it would make sense to contribute the dataids definition to the AMDGPU.jl package, since otherwise we're committing type piracy which is not very cool. I'll probably submit a PR myself. I see you've already done it, great :)

As for the unsafe_wrap issue, I'll try to fix it using a package extension dealing with the specific ROCArray case (but unfortunately I can't test it), so that it works out of the box in the future.

ahojukka5 commented 4 weeks ago

There's a version conflict preventing me trying this

ERROR: Unsatisfiable requirements detected for package MPI [da04e1cc]:
 MPI [da04e1cc] log:
 ├─possible versions are: 0.7.0-0.20.19 or uninstalled
 ├─restricted to versions * by RocmTest [17e98292], leaving only versions: 0.7.0-0.20.19
 │ └─RocmTest [17e98292] log:
 │   ├─possible versions are: 0.1.0 or uninstalled
 │   └─RocmTest [17e98292] is fixed to version 0.1.0
 ├─restricted to versions 0.20 by PencilFFTs [4a48f351], leaving only versions: 0.20.0-0.20.19
 │ └─PencilFFTs [4a48f351] log:
 │   ├─possible versions are: 0.15.1 or uninstalled
 │   └─PencilFFTs [4a48f351] is fixed to version 0.15.1
 ├─restricted by compatibility requirements with AMDGPU [21141c5a] to versions: 0.7.0-0.20.8 or uninstalled, leaving only versions: 0.20.0-0.20.8
 │ └─AMDGPU [21141c5a] log:
 │   ├─possible versions are: 0.9.5 or uninstalled
 │   └─AMDGPU [21141c5a] is fixed to version 0.9.5
 └─restricted by compatibility requirements with PencilArrays [0e08944d] to versions: 0.20.16-0.20.19 — no versions left
   └─PencilArrays [0e08944d] log:
     ├─possible versions are: 0.1.0-0.19.5 or uninstalled
     ├─restricted to versions 0.18-0.19 by PencilFFTs [4a48f351], leaving only versions: 0.18.0-0.19.5
     │ └─PencilFFTs [4a48f351] log: see above
     └─restricted by compatibility requirements with Adapt [79e6a3ab] to versions: [0.1.0-0.16.0, 0.19.3-0.19.5] or uninstalled, leaving only versions: 0.19.3-0.19.5
       └─Adapt [79e6a3ab] log:
         ├─possible versions are: 0.3.0-4.0.4 or uninstalled
         └─restricted to versions 4 by AMDGPU [21141c5a], leaving only versions: 4.0.0-4.0.4
           └─AMDGPU [21141c5a] log: see above
jipolanco commented 4 weeks ago

It's likely a compatibility issue between MPI.jl and AMDGPU.jl. It may be fixed by adding 0.9 here.

Besides I think AMDGPU.jl needs a version bump to include your dataids implementation (it's not yet included in the current v0.9.5).