rusandris / StateTransitionNetworks.jl

Toolkit for dynamics on state transition networks
2 stars 0 forks source link

Incremental sparse construction using `COO` format #12

Open rusandris opened 8 months ago

rusandris commented 8 months ago

We should use the COO sparse format when we construct a sparse matrix in several steps (add each transition at a time). In the COO format (see SparseMatrixCOO from LuxurySparse.jl ) this is more efficient than in the regular CSC format used by SparseArrays.jl. However, the CSC format is more efficient and more suitable for linear algebra computations, so once the matrix is constructed in COO, it should be converted to CSC.

Already implemented in:

Yet to be implemented in:

rusandris commented 8 months ago

Difference between different conversion methods (from COO to CSC): 100×100 SparseMatrix with 280 stored entries

@time sparse(trs.Q)
 11.526183 seconds (12 allocations: 16.016 KiB)

@time SparseMatrixCSC(trs.Q)
  0.016892 seconds (13 allocations: 15.266 MiB)
rusandris commented 8 months ago

COO always makes fewer number of allocations, and it can be much faster when dealing with bigger matrices. The reason for this is stated in the documentation of SparseArrays:

The compressed sparse column storage makes it easy and quick to access the elements in the column of a sparse matrix, whereas accessing the sparse matrix by rows is considerably slower. Operations such as insertion of previously unstored entries one at a time in the CSC structure tend to be slow. This is because all elements of the sparse matrix that are beyond the point of insertion have to be moved one place over.

Tests:

#construct sparse matrices by adding random elements at random indices
function assemble_sparse_CSC(N,row_idcs,col_idcs)

    M = spzeros(N,N)
    n = length(row_idcs)

    for e in 1:n
        i = row_idcs[e]
        j = col_idcs[e]
        M[i,j] = rand()
    end

end

function assemble_sparse_COO(N,row_idcs,col_idcs)

    M = SparseMatrixCOO{Float64, Int64}(Int64[], Int64[], Float64[],N,N)
    n = length(row_idcs)

    for e in 1:n
        i = row_idcs[e]
        j = col_idcs[e]
        M[i,j] = rand()
    end

end
begin 
  N = 20
  n = 100000
  row_idcs = rand(1:N,n)
  col_idcs = rand(1:N,n)
  @time assemble_sparse_CSC(N,row_idcs,col_idcs)
  @time assemble_sparse_COO(N,row_idcs,col_idcs)
end
  0.006734 seconds (200.01 k allocations: 6.119 MiB)
  0.002575 seconds (34 allocations: 5.483 MiB)
begin 
   N = 1000
   n = 100000
   row_idcs = rand(1:N,n)
   col_idcs = rand(1:N,n)
   @time assemble_sparse_CSC(N,row_idcs,col_idcs)
   @time assemble_sparse_COO(N,row_idcs,col_idcs)
end
  0.341956 seconds (298.00 k allocations: 11.262 MiB)
  0.006214 seconds (34 allocations: 5.483 MiB, 37.49% gc time)
rusandris commented 8 months ago

Corrections.

These benchmarks are closer to what the code actually does when building a transition matrix:

#addition instead of overwrite
function assemble_sparse_CSC(N,row_idcs,col_idcs)

  M = spzeros(N,N)
  n = length(row_idcs)

  for e in 1:n
     i = row_idcs[e]
     j = col_idcs[e]
     M[i,j] += 1.0    
  end
end

assemble_sparse_CSC (generic function with 1 method)

function assemble_sparse_COO(N,row_idcs,col_idcs)

  M = SparseMatrixCOO{Float64, Int64}(Int64[], Int64[], Float64[],N,N)
  n = length(row_idcs)

  for e in 1:n
     i = row_idcs[e]
     j = col_idcs[e]
     M[i,j] = 1.0
  end
end

Interesting observation: benchmarks with non-random indices (keeping the time correlation, successive states) show a different picture:

orbit,t = trajectory(system,10^7;Ttr=10^3)
sts = timeseries_to_grid(orbit, 10)

 begin 
  N = 10
  n = length(sts)
  row_idcs = sts[1:end-1]
  col_idcs = sts[2:end]
  @btime assemble_sparse_CSC(N,row_idcs,col_idcs)
  @btime assemble_sparse_COO(N,row_idcs,col_idcs)
 end
  201.066 ms (7 allocations: 1.08 KiB)
  515.448 ms (55 allocations: 440.49 MiB)

sts = timeseries_to_grid(orbit, 1000)

begin 
  N = 1000
  n = length(sts)
  row_idcs = sts[1:end-1]
  col_idcs = sts[2:end]
  @btime assemble_sparse_CSC(N,row_idcs,col_idcs)
  @btime assemble_sparse_COO(N,row_idcs,col_idcs)
end
  262.285 ms (15 allocations: 115.84 KiB)
  486.418 ms (55 allocations: 440.49 MiB)