under-Peter / OMEinsum.jl

One More Einsum for Julia! With runtime order-specification and high-level adjoints for AD
https://under-peter.github.io/OMEinsum.jl/dev/
MIT License
181 stars 23 forks source link

Errors when using permutedims on CuArray #127

Closed ChenZhao44 closed 2 years ago

ChenZhao44 commented 2 years ago

My CUDA.jl version is 3.5.0.

Here is a MWE. It seems like an issue from CUDA.jl.

julia> c = CUDA.rand(4, [2 for _ = 2:18]...);
julia> permutedims(c, 18:-1:1)
ERROR: InvalidIRError: compiling kernel permutedims_kernel(CUDA.CuKernelContext, CuDeviceArray{Float32, 18, 1}, CuDeviceArray{Float32, 18, 1}, Val{(18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)}) resulted in invalid LLVM IR
Reason: unsupported call through a literal pointer (call to )
Stacktrace:
 [1] Array
   @ ./boot.jl:448
 [2] map
   @ ./tuple.jl:224
 [3] axes
   @ ./abstractarray.jl:89
 [4] CartesianIndices
   @ ./multidimensional.jl:279
 [5] macro expansion
   @ ~/.julia/packages/GPUArrays/3sW6s/src/device/indexing.jl:81
 [6] permutedims_kernel
   @ ~/.julia/packages/GPUArrays/3sW6s/src/host/linalg.jl:188
Reason: unsupported call to the Julia runtime (call to jl_f__apply_iterate)
Stacktrace:
 [1] map
   @ ./tuple.jl:228
 [2] axes
   @ ./abstractarray.jl:89
 [3] CartesianIndices
   @ ./multidimensional.jl:279
 [4] macro expansion
   @ ~/.julia/packages/GPUArrays/3sW6s/src/device/indexing.jl:81
 [5] permutedims_kernel
   @ ~/.julia/packages/GPUArrays/3sW6s/src/host/linalg.jl:188
Reason: unsupported dynamic function invocation (call to CartesianIndices)
Stacktrace:
 [1] CartesianIndices
   @ ./multidimensional.jl:279
 [2] macro expansion
   @ ~/.julia/packages/GPUArrays/3sW6s/src/device/indexing.jl:81
 [3] permutedims_kernel
   @ ~/.julia/packages/GPUArrays/3sW6s/src/host/linalg.jl:188
Reason: unsupported dynamic function invocation (call to getindex)
Stacktrace:
 [1] macro expansion
   @ ~/.julia/packages/GPUArrays/3sW6s/src/device/indexing.jl:81
 [2] permutedims_kernel
   @ ~/.julia/packages/GPUArrays/3sW6s/src/host/linalg.jl:188
Reason: unsupported call to the Julia runtime (call to jl_f_apply_type)
Stacktrace:
 [1] permutedims_kernel
   @ ~/.julia/packages/GPUArrays/3sW6s/src/host/linalg.jl:190
Reason: unsupported call to the Julia runtime (call to jl_new_structv)
Stacktrace:
 [1] permutedims_kernel
   @ ~/.julia/packages/GPUArrays/3sW6s/src/host/linalg.jl:190
Reason: unsupported dynamic function invocation (call to map)
Stacktrace:
 [1] permutedims_kernel
   @ ~/.julia/packages/GPUArrays/3sW6s/src/host/linalg.jl:190
Reason: unsupported dynamic function invocation (call to CartesianIndex)
Stacktrace:
 [1] permutedims_kernel
   @ ~/.julia/packages/GPUArrays/3sW6s/src/host/linalg.jl:190
Reason: unsupported dynamic function invocation (call to getindex)
Stacktrace:
 [1] permutedims_kernel
   @ ~/.julia/packages/GPUArrays/3sW6s/src/host/linalg.jl:191
Reason: unsupported dynamic function invocation (call to setindex!)
Stacktrace:
 [1] permutedims_kernel
   @ ~/.julia/packages/GPUArrays/3sW6s/src/host/linalg.jl:191
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams, GPUCompiler.FunctionSpec{GPUArrays.var"#permutedims_kernel#47", Tuple{CUDA.CuKernelContext, CuDeviceArray{Float32, 18, 1}, CuDeviceArray{Float32, 18, 1}, Val{(18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)}}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/AJD5L/src/validation.jl:111
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/AJD5L/src/driver.jl:333 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/SSeq1/src/TimerOutput.jl:252 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/AJD5L/src/driver.jl:331 [inlined]
  [5] emit_asm(job::GPUCompiler.CompilerJob, ir::LLVM.Module; strip::Bool, validate::Bool, format::LLVM.API.LLVMCodeGenFileType)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/AJD5L/src/utils.jl:62
  [6] cufunction_compile(job::GPUCompiler.CompilerJob)
    @ CUDA ~/.julia/packages/CUDA/YpW0k/src/compiler/execution.jl:326
  [7] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(CUDA.cufunction_compile), linker::typeof(CUDA.cufunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/AJD5L/src/cache.jl:89
  [8] cufunction(f::GPUArrays.var"#permutedims_kernel#47", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceArray{Float32, 18, 1}, CuDeviceArray{Float32, 18, 1}, Val{(18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)}}}; name::Nothing, kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/YpW0k/src/compiler/execution.jl:297
  [9] cufunction
    @ ~/.julia/packages/CUDA/YpW0k/src/compiler/execution.jl:291 [inlined]
 [10] macro expansion
    @ ~/.julia/packages/CUDA/YpW0k/src/compiler/execution.jl:102 [inlined]
 [11] #launch_heuristic#234
    @ ~/.julia/packages/CUDA/YpW0k/src/gpuarrays.jl:17 [inlined]
 [12] gpu_call(::GPUArrays.var"#permutedims_kernel#47", ::CuArray{Float32, 18, CUDA.Mem.DeviceBuffer}, ::CuArray{Float32, 18, CUDA.Mem.DeviceBuffer}, ::Val{(18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)}; target::CuArray{Float32, 18, CUDA.Mem.DeviceBuffer}, elements::Nothing, threads::Nothing, blocks::Nothing, name::Nothing)
    @ GPUArrays ~/.julia/packages/GPUArrays/3sW6s/src/device/execution.jl:68
 [13] gpu_call(::GPUArrays.var"#permutedims_kernel#47", ::CuArray{Float32, 18, CUDA.Mem.DeviceBuffer}, ::CuArray{Float32, 18, CUDA.Mem.DeviceBuffer}, ::Val{(18, 17, 16, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1)})
    @ GPUArrays ~/.julia/packages/GPUArrays/3sW6s/src/device/execution.jl:48
 [14] permutedims!(dest::CuArray{Float32, 18, CUDA.Mem.DeviceBuffer}, src::CuArray{Float32, 18, CUDA.Mem.DeviceBuffer}, perm::NTuple{18, Int64})
    @ GPUArrays ~/.julia/packages/GPUArrays/3sW6s/src/host/linalg.jl:195
 [15] permutedims!(dest::CuArray{Float32, 18, CUDA.Mem.DeviceBuffer}, src::CuArray{Float32, 18, CUDA.Mem.DeviceBuffer}, perm::StepRange{Int64, Int64})
    @ GPUArrays ~/.julia/packages/GPUArrays/3sW6s/src/host/linalg.jl:200
 [16] permutedims(B::CuArray{Float32, 18, CUDA.Mem.DeviceBuffer}, perm::StepRange{Int64, Int64})
    @ Base ./multidimensional.jl:1494
 [17] top-level scope
    @ REPL[48]:1
GiggleLiu commented 2 years ago

Thanks for the issue, this is caused by https://github.com/JuliaGPU/GPUArrays.jl/issues/340 The permutedims in GPUArrays is very bad for high dimensional tensors, there are some other issues like: https://github.com/JuliaGPU/GPUArrays.jl/issues/375

So I rewrite the permutedims kernel in PR #128 .

ChenZhao44 commented 2 years ago

Thanks for your quick fix! However, the new branch doesn't fix the issue for me.

GiggleLiu commented 2 years ago

Sorry, now it should be fine. FYI: This bug is fixed in Julia 1.7

GiggleLiu commented 2 years ago

You probably will be insterested in this patch: https://github.com/JuliaGPU/GPUArrays.jl/pull/334/files to make your code work properly in Julia-1.6.

ChenZhao44 commented 2 years ago

Thanks, Julia 1.7 works fine. I will use 1.7.