omlins / ParallelStencil.jl

Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs
BSD 3-Clause "New" or "Revised" License
311 stars 31 forks source link

CUDA Crash with julia 1.9.0 #91

Closed LaurentPlagne closed 1 year ago

LaurentPlagne commented 1 year ago

Hi, I tried to run acoustic3D.jl miniapp using CUDA (USE_GPU = true). Everything is fine with julia 1.8.5 but crash with julia 1.9

julia> include("acoustic3D.jl")
┌ Warning: ParallelStencil has already been initialized, with the same arguments. If you are using ParallelStencil interactively in the REPL, then you can ignore this message. If you are using ParallelStencil non-interactively, then you are likely using ParallelStencil in an inconsistent way: @init_parallel_stencil should only be called once, right after 'using ParallelStencil'.
└ @ ParallelStencil ~/.julia/packages/ParallelStencil/fQa5L/src/init_parallel_stencil.jl:73
┌ Warning: Module Data from previous module initialization found in caller module (Main); module Data not created. If you are working interactively in the REPL, then you can ignore this message.
└ @ ParallelStencil.ParallelKernel ~/.julia/packages/ParallelStencil/fQa5L/src/ParallelKernel/init_parallel_kernel.jl:33
ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA ~/.julia/packages/CUDA/BbliS/lib/cudadrv/error.jl:89
  [2] macro expansion
    @ ~/.julia/packages/CUDA/BbliS/lib/cudadrv/error.jl:97 [inlined]
  [3] cuMemAllocAsync(dptr::Base.RefValue{CUDA.CuPtr{Nothing}}, bytesize::Int64, hStream::CUDA.CuStream)
    @ CUDA ~/.julia/packages/CUDA/BbliS/lib/utils/call.jl:26
  [4] #alloc#1

...

(test_pstencil) pkg> status
Status `~/temp/test_pstencil/Project.toml`
⌅ [052768ef] CUDA v3.13.1
  [94395366] ParallelStencil v0.6.1
  [91a5bcdd] Plots v1.38.11
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated`

julia> versioninfo()
Julia Version 1.9.0
Commit 8e630552924 (2023-05-07 11:25 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × 13th Gen Intel(R) Core(TM) i9-13900K
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, goldmont)
  Threads: 8 on 32 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 8
  JULIA_IMAGE_THREADS = 1

julia> using CUDA

julia> CUDA.versioninfo()
CUDA toolkit 11.7, artifact installation
NVIDIA driver 530.41.3, for CUDA 12.1
CUDA driver 12.1

Libraries: 
- CUBLAS: 11.10.1
- CURAND: 10.2.10
- CUFFT: 10.7.2
- CUSOLVER: 11.3.5
- CUSPARSE: 11.7.3
- CUPTI: 17.0.0
- NVML: 12.0.0+530.41.3
- CUDNN: 8.30.2 (for CUDA 11.5.0)
- CUTENSOR: 1.4.0 (for CUDA 11.5.0)

Toolchain:
- Julia: 1.9.0
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 4070 (sm_89, 9.942 GiB / 11.994 GiB available)

julia> Unhandled Task ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
omlins commented 1 year ago

@LaurentPlagne : thanks for reporting this; we were like you know when we have this fixed...

LaurentPlagne commented 1 year ago

Thank your for your prompt answer. I wonder if the constraint on the CUDA version (imposed by ParallelStencils.jl?) could cause the problem ? It works OK with Julia 1.8.5 so it is not a blocking issue for me.

omlins commented 1 year ago

I wonder if the constraint on the CUDA version (imposed by ParallelStencils.jl?) could cause the problem ?

Hopefully it will be chose this! We will soon release ParallelStencil compatible with the latest CUDA version. Julia 1.9 was released earlier than i expected. In the last years it has been rather around the time of the conference... So we got surprised with it and are not ready..

omlins commented 1 year ago

@LaurentPlagne : can you test if with the main branch (] add ParallelStencil#main) you still get the error?

LaurentPlagne commented 1 year ago

It work with Julia 1.9.0 but I had to manually install CellArrays.jl and StaticArrays.jl I can't run the miniapp acoustic_waves_multixpu. It seems that ImplicitGlobalGrid.jl prevent from using recent CUDA.jl versions...

luraess commented 1 year ago

We are in the process of updating IGG to run with latest MPI.jl and to support GPU-aware operation with AMDGPU.jl bckend.

omlins commented 1 year ago

Fixed in #81.

omlins commented 1 year ago

@LaurentPlagne: thanks for testing!

It work with Julia 1.9.0 but I had to manually install CellArrays.jl and StaticArrays.jl

Solved here: Remove need to have any packaged pre installed #95