m3g / CellListMap.jl

Flexible implementation of cell lists to map the calculations of particle-pair dependent functions, such as forces, energies, neighbor lists, etc.
https://m3g.github.io/CellListMap.jl/
MIT License
87 stars 4 forks source link

CellListMap hangs #92

Closed dforero0896 closed 1 year ago

dforero0896 commented 1 year ago

Hi, I've been using CellListMap to compute some quick two-point functions (2pf). At first it seemed to work well but after some processing of my data, trying to recompute the 2pf hangs at about 70% eternally (apparently, at least more than one night). This is odd since before the processing, for the same number of points the 2pf calculation takes about 3s. After I Ctrl+C the calculation, it seems all Julia is stuck since everything is stuck until I relaunch the REPL. Here I add the relevant parts of my code:

using CellListMap
using StaticArrays
using NPZ

function build_histogram!(d2,hist, bin_edges)
    ibin = searchsortedlast(bin_edges, sqrt(d2))
    #ibin = floor(Int,d) + 1
    if (ibin > 0) && ibin <= length(bin_edges)
        hist[ibin] += 1
    end #if
    return hist
end;

function compute_2pcf(catalog, box_size, bin_edges)
    sides = [eltype(catalog)(b) for b in box_size]
    #cutoff = box_size[1] / size(counts_ref, 1)
    cutoff = eltype(catalog)(bin_edges[end])
    box = Box(sides, cutoff)

    cl = CellList(catalog, box)

    # Initialize (and preallocate) the histogram
    hist = zeros(Int,size(bin_edges,1)-1);
    println("Counting pairs...")
    # Run calculation
    map_pairwise!(
        (x,y,i,j,d2,hist) -> build_histogram!(d2,hist, bin_edges),
        hist,box,cl; show_progress = true
    )
    println("Done")
    #hist = hist  / (length(satellites[1]) * (length(centrals[1])))
    N = size(catalog,1)
    hist = hist  / (N * (N - 1))
    norm = @. (4/3) * π * (bin_edges[2:end]^3 -bin_edges[1:end-1]^3) / (box_size[1] * box_size[2] * box_size[3])
    hist ./= norm
    centers = bin_edges[2:end] - bin_edges[1:end-1]
    centers, hist
end #

compute_2pcf(x, y, z, box_size, bin_edges) = compute_2pcf(Matrix(hcat(x,y,z)'), box_size, bin_edges)

box_size = @SVector [2000f0 for _ in 1:3]
bin_edges = 10 .^range(-2, stop = log10(50), length=11)
before  = npzread("before.npy")
after  = npzread("after.npy")

s, xi = compute_2pcf(before, box_size, bin_edges)                                                                                                                           
s, xi = compute_2pcf(after, box_size, bin_edges)   

output:

Counting pairs...                                                                                                                                                                  
Progress: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| Time: 0:00:06         
Done                                                                                                                                                                               
([0.013436729115920996, 0.03149129804938488, 0.07380530218921652, 0.17297548747273575, 0.4053979643592893, 0.9501202274834459, 2.226771039908675, 5.218822966551732, 12.23121401710
201, 28.665964967767582], [0.0, 1.6660056393630336e13, 2.038290235445229e13, 1.9276573437228812e13, 1.88991060147404e13, 1.8194857516864344e13, 1.4969815963181014e13, 7.5421693681
98844e12, 3.7787949312305874e12, 3.0175790787229033e12])                                                                                                                           

Counting pairs...                                                                                                                                                                  
Progress:  71%|███████████████████████████████████████████████████████████████████████████████████████████████████▊                                        |  ETA: 0:00:00^CERROR: 
InterruptException:                                                                                                                                                                
Stacktrace:                                                                                                                                                                        
  [1] poptask(W::Base.IntrusiveLinkedListSynchronized{Task})                                                                                                                       
   @ Base ./task.jl:974                                                                                                                                                           
  [2] wait()
    @ Base ./task.jl:983
  [3] wait(c::Base.GenericCondition{Base.Threads.SpinLock}; first::Bool)
    @ Base ./condition.jl:130
  [4] wait(c::Base.GenericCondition{Base.Threads.SpinLock})
    @ Base ./condition.jl:125
  [5] _wait(t::Task)
    @ Base ./task.jl:308
  [6] sync_end(c::Channel{Any})
    @ Base ./task.jl:404 [7] macro expansion                                                                                                                                                     [35/1946]
    @ ./task.jl:477 [inlined]
  [8] map_pairwise_parallel!(f::var"#4#6"{Vector{Float64}}, output::Vector{Int64}, box::Box{OrthorhombicCell, 3, Float64, Float64, 9, Float64}, cl::CellList{3, Float64}; output_th
readed::Nothing, reduce::typeof(CellListMap.reduce), show_progress::Bool)
    @ CellListMap ~/.julia/packages/CellListMap/CQJ0D/src/CoreComputing.jl:148
  [9] map_pairwise_parallel!
    @ ~/.julia/packages/CellListMap/CQJ0D/src/CoreComputing.jl:135 [inlined]
 [10] #map_pairwise!#109
    @ ~/.julia/packages/CellListMap/CQJ0D/src/CellListMap.jl:98 [inlined]
 [11] compute_2pcf(catalog::Matrix{Float64}, box_size::SVector{3, Float32}, bin_edges::Vector{Float64})
    @ Main ./REPL[6]:13
 [12] top-level scope
    @ REPL[18]:1

I've checked there is no NaN or infinities in the after array. I've also removed threads and it didn't even show the progress bar. When sending Ctrl+C nothing happened until I had to kill the whole process so no traceback there.

Thanks in advance for your help!

lmiq commented 1 year ago

Do you have the before.npy and after.npy files to share?

I don´t see either where bin_edges is defined.

dforero0896 commented 1 year ago

Hi! thanks for the quick answer. I knew I forgot something. I've added the definition of bin_edges. As for the files, they are quite big. But since the issue seems quite specific to the data I wouldn't know how to change them. I am downloading them and will upload somewhere ASAP.

dforero0896 commented 1 year ago

Here you may find the data https://drive.google.com/drive/folders/14zU_YqpMIthmFba3GvIGx-PayKmmq6cW?usp=sharing

lmiq commented 1 year ago

Can you check which version of CellListMap are you using ? (or perhaps just dump a julia> st output). Just to anticipate some possible questions.

dforero0896 commented 1 year ago

Here

Status `~/projects/desi-patchy/Project.toml`
  [69e1c6dd] CellListMap v0.8.22
  [b37e267f] CosmoCorr v0.1.0 `https://github.com/dforero0896/CosmoCorr.jl.git#main`
  [a93c6f00] DataFrames v1.6.1
  [31c24e10] Distributions v0.25.100
  [525bcba6] FITSIO v0.17.1
  [f6369f11] ForwardDiff v0.10.36
  [bdcacae8] LoopVectorization v0.12.165
  [15e1cf62] NPZ v0.4.3
  [91a5bcdd] Plots v1.38.17
  [37e2e3b7] ReverseDiff v1.15.1
  [90137ffa] StaticArrays v1.6.2
  [2913bbd2] StatsBase v0.34.0
  [e88e6eb3] Zygote v0.6.63
lmiq commented 1 year ago

I can reproduce the issue.

dforero0896 commented 1 year ago

I think I found the reason. Doing a histogram of the positions it looks like most of them are 0, which is not expected. This would mean there would just be too many pairs.

lmiq commented 1 year ago

Yes, exactly, that's the issue:

(i, cellᵢ.n_particles) = (1, 2579779)

That's the number of particles in the first cell. I don´t think there is any issue specifically besides that.

dforero0896 commented 1 year ago

Thanks a lot!