trixi-gpu / TrixiCUDA.jl

CUDA acceleration for Trixi.jl
https://trixi-gpu.github.io/TrixiCUDA.jl/
MIT License
46 stars 6 forks source link

Tuple indexing for 2D and 3D (`boundary_condition_periodic`) #8

Closed huiyuxie closed 1 year ago

huiyuxie commented 1 year ago

This is a continued issue for issue #6 as the original case failed for both 2D and 3D if there exists boundary_condition_periodic in boundary_conditions. I will update this issue later.

huiyuxie commented 1 year ago

I left some comments in the original issue.

huiyuxie commented 1 year ago

Good reference: https://discourse.julialang.org/t/is-there-any-good-way-to-call-functions-from-a-set-of-functions-in-a-cuda-kernel/102051/3

huiyuxie commented 1 year ago

First approach (failed). Filter all the indexes i that make boundary_conditions[i] != boundary_condition_periodic and then map them to the firsts and lasts arrays to get boundary_arr, then call kernel threads with boundaries[k] in boundary_arr (i.e., skip all boundary_condition_periodic function calls in one kernel call). This way failed and caused the same error like

ERROR: InvalidIRError: compiling MethodInstance for boundary_flux_kernel!(::CuDeviceArray{Float32, 5, 1}, ::CuDeviceArray{Float32, 5, 1}, ::CuDeviceArray{Float32, 4, 1}, ::Float64, ::CuDeviceVector{Int32, 1}, ::CuDeviceVector{Int32, 1}, ::CuDeviceVector{Int32, 1}, ::CuDeviceVector{Int32, 1}, ::CuDeviceVector{Int32, 1}, ::NamedTuple{(:x_neg, :x_pos, :y_neg, :y_pos, :z_neg, :z_pos), Tuple{typeof(boundary_condition_poisson_nonperiodic), typeof(boundary_condition_poisson_nonperiodic), Vararg{BoundaryConditionPeriodic, 4}}}, ::HyperbolicDiffusionEquations3D{Float32}, ::FluxLaxFriedrichs{typeof(max_abs_speed_naive)}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to ijl_get_nth_field_checked)
Stacktrace:
 [1] getindex
   @ ./namedtuple.jl:136
 [2] boundary_flux_kernel!
   @ ~/trixi_cuda/cuda_dg_3d.jl:477
Reason: unsupported dynamic function invocation
Stacktrace:
 [1] boundary_flux_kernel!
   @ ~/trixi_cuda/cuda_dg_3d.jl:478
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/validation.jl:149
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:411 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/RsWnF/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:410 [inlined]
  [5] emit_llvm(job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool, ctx::LLVM.ThreadSafeContext)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/utils.jl:89
  [6] emit_llvm
    @ ~/.julia/packages/GPUCompiler/NVLGB/src/utils.jl:83 [inlined]
  [7] codegen(output::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, parent_job::Nothing, ctx::LLVM.ThreadSafeContext)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:118
  [8] codegen
    @ ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:92 [inlined]
  [9] compile(target::Symbol, job::GPUCompiler.CompilerJob; libraries::Bool, toplevel::Bool, optimize::Bool, cleanup::Bool, strip::Bool, validate::Bool, only_entry::Bool, ctx::LLVM.ThreadSafeContext)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:88
 [10] compile
    @ ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:79 [inlined]
 [11] compile(job::GPUCompiler.CompilerJob, ctx::LLVM.ThreadSafeContext)
    @ CUDA ~/.julia/packages/CUDA/pCcGc/src/compiler/compilation.jl:125
 [12] #1032
    @ ~/.julia/packages/CUDA/pCcGc/src/compiler/compilation.jl:120 [inlined]
 [13] LLVM.ThreadSafeContext(f::CUDA.var"#1032#1033"{GPUCompiler.CompilerJob{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}})
    @ LLVM ~/.julia/packages/LLVM/5aiiG/src/executionengine/ts_module.jl:14
 [14] JuliaContext
    @ ~/.julia/packages/GPUCompiler/NVLGB/src/driver.jl:35 [inlined]
 [15] compile
    @ ~/.julia/packages/CUDA/pCcGc/src/compiler/compilation.jl:119 [inlined]
 [16] actual_compilation(cache::Dict{Any, Any}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/execution.jl:125
 [17] cached_compilation(cache::Dict{Any, Any}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/NVLGB/src/execution.jl:103
 [18] macro expansion
    @ ~/.julia/packages/CUDA/pCcGc/src/compiler/execution.jl:318 [inlined]
 [19] macro expansion
    @ ./lock.jl:267 [inlined]
 [20] cufunction(f::typeof(boundary_flux_kernel!), tt::Type{Tuple{CuDeviceArray{Float32, 5, 1}, CuDeviceArray{Float32, 5, 1}, CuDeviceArray{Float32, 4, 1}, Float64, CuDeviceVector{Int32, 1}, CuDeviceVector{Int32, 1}, CuDeviceVector{Int32, 1}, CuDeviceVector{Int32, 1}, CuDeviceVector{Int32, 1}, NamedTuple{(:x_neg, :x_pos, :y_neg, :y_pos, :z_neg, :z_pos), Tuple{typeof(boundary_condition_poisson_nonperiodic), typeof(boundary_condition_poisson_nonperiodic), Vararg{BoundaryConditionPeriodic, 4}}}, HyperbolicDiffusionEquations3D{Float32}, FluxLaxFriedrichs{typeof(max_abs_speed_naive)}}}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ CUDA ~/.julia/packages/CUDA/pCcGc/src/compiler/execution.jl:313
 [21] cufunction
    @ ~/.julia/packages/CUDA/pCcGc/src/compiler/execution.jl:310 [inlined]
 [22] macro expansion
    @ ~/.julia/packages/CUDA/pCcGc/src/compiler/execution.jl:104 [inlined]
 [23] cuda_boundary_flux!(t::Float64, mesh::TreeMesh{3, SerialTree{3}}, boundary_conditions::NamedTuple{(:x_neg, :x_pos, :y_neg, :y_pos, :z_neg, :z_pos), Tuple{typeof(boundary_condition_poisson_nonperiodic), typeof(boundary_condition_poisson_nonperiodic), Vararg{BoundaryConditionPeriodic, 4}}}, equations::HyperbolicDiffusionEquations3D{Float32}, dg::DGSEM{LobattoLegendreBasis{Float64, 5, SVector{5, Float64}, Matrix{Float64}, Matrix{Float64}, Matrix{Float64}}, LobattoLegendreMortarL2{Float64, 5, Matrix{Float64}, Matrix{Float64}}, SurfaceIntegralWeakForm{FluxLaxFriedrichs{typeof(max_abs_speed_naive)}}, VolumeIntegralWeakForm}, cache::NamedTuple{(:elements, :interfaces, :boundaries, :mortars, :fstar_upper_left_threaded, :fstar_upper_right_threaded, :fstar_lower_left_threaded, :fstar_lower_right_threaded, :fstar_tmp1_threaded), Tuple{ElementContainer3D{Float64, Float64}, InterfaceContainer3D{Float64}, BoundaryContainer3D{Float64, Float64}, L2MortarContainer3D{Float64}, Vararg{Vector{Array{Float64, 3}}, 5}}})
    @ Main ~/trixi_cuda/cuda_dg_3d.jl:523
 [24] top-level scope
    @ ~/trixi_cuda/cuda_dg_3d.jl:675
huiyuxie commented 1 year ago

Second approach (succeed). Apply another helper function (same for 2D and 3D) as below https://github.com/huiyuxie/trixi_cuda/blob/98deae15181dd2050f0c71c1448a32bbce7bc6da/cuda_dg_3d.jl#L68-L77 to stabilize the kernel call and this time boundary_flux could be computed in one kernel, see https://github.com/huiyuxie/trixi_cuda/blob/fb81195ed862a3248d6123fa9fd7c37ab4bc8339/cuda_dg_2d.jl#L478-L512 https://github.com/huiyuxie/trixi_cuda/blob/fb81195ed862a3248d6123fa9fd7c37ab4bc8339/cuda_dg_3d.jl#L506-L540 With the help of stable helper function, there is no need to skip the function call to boundary_condition_periodic.

Reference: https://github.com/leios/simuleios/blob/master/test/tuple_test_2.jl Thanks to someone @leios in the community for providing their ideas!

huiyuxie commented 1 year ago

Sorry, I'm feeling depressed today, and I simply applied someone else's idea here. This is the best I can do right now to improve this kernel. If you @ranocha have any better ideas, I'm open to them. Thanks!