jipolanco / PencilArrays.jl

Distributed Julia arrays using the MPI protocol
https://jipolanco.github.io/PencilArrays.jl/dev/
MIT License
60 stars 8 forks source link

Problem with CuPtr and MPI #47

Open corentin-dev opened 2 years ago

corentin-dev commented 2 years ago

Not yet finished with CUDA I guess ... It works for transposition on single processors, but when switching to multiprocessor, there are still some errors.

Closest candidates are:
  convert(::Type{T}, !Matched::T) where T at /usr/share/julia/base/essentials.jl:218
Stacktrace:
  [1] Base ./T::Base ./Type, x::CUDA.CuPtr{ComplexF64})
    @ essentials.jl:417
  [2] Isend(buf::MPI.Buffer{CUDA.CuPtr{ComplexF64}}, dest::Int64, tag::Int64, comm::MPI.Comm)
    @ MPI ~/.julia/packages/MPI/08SPr/src/pointtopoint.jl:229
  [3] transpose_send_other!(#unused#::PencilArrays.Transpositions.PointToPoint, buf_info::cconvert(essentials.jl:417
  [2] Isend(buf::MPI.Buffer{CUDA.CuPtr{ComplexF64}}, dest::Int64, tag::Int64, comm::MPI.Comm)
    @ MPI ~/.julia/packages/MPI/08SPr/src/pointtopoint.jl:229
  [3] transpose_send_other!(#unused#::PencilArrays.Transpositions.PointToPoint, buf_info::Base ./NamedTuple{(:send_ptr, :recv_ptr), Tuple{Base.RefValue{CUDA.CuPtr{ComplexF64}}, Base.RefValue{CUDA.CuPtr{ComplexF64}}}}, ::Tuple{Int64, Int64}, n::Int64, ::T::Tuple{Vector{MPI.Request}, Vector{MPI.Request}}, ::Tuple{Int64, MPI.Comm}, #unused#::Type{ComplexF64})
    @ PencilArrays.Transpositions ~/.julia/dev/PencilArrays/src/Transpositions/Transpositions.jl:442
  [4] transpose_send!(::NamedTuple{(:send_ptr, :recv_ptr), Tuple{Base.RefValue{CUDA.CuPtr{ComplexF64}}, Base.RefValue{CUDA.CuPtr{ComplexF64}}}}, ::Tuple{Int64, Int64}, n::Int64, ::Tuple{Vector{MPI.Request}, Vector{MPI.Request}}, ::Tuple{Int64, MPI.Comm}, #unused#::Type{ComplexF64})
    @ PencilArrays.Transpositions ~/.julia/dev/PencilArrays/src/Transpositions/Transpositions.jl:442
  [4] transpose_send!(::Tuple{CUDA.CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}, recv_offsets::Vector{Int64}, requests::Tuple{Vector{MPI.Request}, Vector{MPI.Request}}, length_self::Int64, remote_inds::Tuple{CUDA.CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}, CUDA.CuArray{ComplexF64, 1, CUDA.Mem.DeviceBuffer}}, recv_offsets::CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}, ::Vector{Int64}, requests::Tuple{MPI.Comm, Vector{Int64}, Int64}, Ao::Tuple{Vector{MPI.Request}, Vector{MPI.Request}}, length_self::Int64, remote_inds::CartesianIndices{2, Tuple{UnitRange{Int64}, UnitRange{Int64}}}, ::Tuple{MPI.Comm, Vector{Int64}, Int64}, Ao::Type, x::PencilArrays.PencilArray{ComplexF64, 3, CUDA.CuArray{ComplexF64, 3, CUDA.Mem.DeviceBuffer}, 3, 0, PencilArrays.Pencils.Pencil{3, 2, StaticPermutations.Permutation{(2, 1, 3), 3}, CUDA.CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer}}}, Ai::CUDA.CuPtr{ComplexF64})

This issue is open so that I don't forget it exists, I'll come back with more information when I start working on it again. Corentin