jipolanco / PencilFFTs.jl

Fast Fourier transforms of MPI-distributed Julia arrays
https://jipolanco.github.io/PencilFFTs.jl/dev/
MIT License
77 stars 7 forks source link

@benchmark hang on when processors is large #51

Closed Lightup1 closed 2 years ago

Lightup1 commented 2 years ago

cpubench.jl:

using MPI
using PencilFFTs
using FFTW
using Random
using BenchmarkTools

MPI.Init(threadlevel=:funneled)
comm = MPI.COMM_WORLD

FFTW.set_num_threads(Threads.nthreads())

rank=MPI.Comm_rank(comm)

# Input data dimensions (Nx × Ny × Nz)
dims = (5120, 32, 32)
pen = Pencil(dims, comm)
transform=Transforms.FFT!()

if rank == 0
    print("Threads:",Threads.nthreads(),"\n")
    print("data size:",dims,"\n")
    print("Start data allocationg\n")
end
plan = PencilFFTPlan(pen, transform)
u = allocate_input(plan)
if rank == 0
    print("Complete data allocationg\n")
end

if rank == 0
    print("Start randn data \n")
end
randn!(first(u))
if rank == 0
    print("Complete randn data \n")
end

if rank == 0
    print("Start benchmark \n")
end
b = @benchmark $plan*$u evals=1 samples=100 seconds=30 teardown=(MPI.Barrier(comm))
if rank == 0
    print("Complete benchmark \n")
end

if rank == 0
    io = IOBuffer()
    show(io, "text/plain", b)
    s = String(take!(io))
    println(s)
end

cpu_bench.sh

#!/bin/bash
#SBATCH -N 8
#SBATCH --ntasks-per-node=36
#SBATCH -J cpuN8t1Pen        # N nodes p process t threads
#SBATCH --cpus-per-task=1       # 36 cpus per node
#SBATCH --time=00:5:00
#SBATCH -p work
#SBATCH --output=slurm-%x-%j.out
#SBATCH --error=slurm-%x-%j.err
srun julia -t1 cpubench.jl
jipolanco commented 2 years ago

Thanks! I'll see if I can reproduce this.

Lightup1 commented 2 years ago

out:

Threads:1
data size:(5120, 32, 32)
Start data allocationg
Complete data allocationg
Start randn data 
Complete randn data 
Start benchmark 

err:

srun: Job step aborted: Waiting up to 62 seconds for job step to finish.
slurmstepd: error: *** STEP 2099131.0 ON cpn4 CANCELLED AT 2022-06-22T20:43:50 DUE TO TIME LIMIT ***
slurmstepd: error: *** JOB 2099131 ON cpn4 CANCELLED AT 2022-06-22T20:43:50 DUE TO TIME LIMIT ***

signal (15): Terminated
in expression starting at /GPUFS/hust_jmcai_2/YuBY/LargeDipole/PencilFFTstest/cpubench.jl:42

signal (15): Terminated
in expression starting at /GPUFS/hust_jmcai_2/YuBY/LargeDipole/PencilFFTstest/cpubench.jl:42

etc...

terminated due to time limit.

Lightup1 commented 2 years ago

It disapperas after I change to a newer MPI .