mashu / LineageCollapse.jl

High-performance Julia package for performing lineage collapsing on immune repertoire sequencing data.
MIT License
3 stars 0 forks source link

Out of memory issue with many sequences #14

Open mashu opened 7 hours ago

mashu commented 7 hours ago

Stack trace is

OutOfMemoryError()                                                                                                                                                               
Stacktrace:                                                                                                                                                                     
  [1] GenericMemory                                                                                                                                                             
    @ ./boot.jl:516 [inlined]                                                                                                                                                   
  [2] new_as_memoryref                                                                                                                                                          
    @ ./boot.jl:535 [inlined]                                                                                                                                                   
  [3] Array                                                                                                                                                                     
    @ ./boot.jl:582 [inlined]                                                                                                                                                   
  [4] Array                                                                                                                                                                     
    @ ./boot.jl:592 [inlined]                                                                                                                                                   
  [5] zeros                                                                                                                                                                     
    @ ./array.jl:578 [inlined]                                                                                                                                                  
  [6] zeros                                                                                                                                                                     
    @ ./array.jl:574 [inlined]                                                                                                                                                  
  [7] compute_pairwise_distance(metric::NormalizedHammingDistance, sequences::Vector{BioSequences.LongSequence{BioSequences.DNAAlphabet{4}}}) 

This happened after some time with one of bigger libraries, presumably hitting large cluster.

The idea is to use sparse array with zeros in lower triangle and Float32, code below works just fine

  using SparseArrays
using LinearAlgebra
using Clustering
dist = spzeros(Float32, 10, 10)
for i in 1:10
    for j in i+1:1
        dist[i, j] = 1
    end
end
sdist = LinearAlgebra.Symmetric(dist)
hclusters = hclust(sdist, linkage=:single)

The Symmetric wrapper from LinearAlgebra is to return upper values when accessing bottom triangle.

mashu commented 6 hours ago

Help needed to fix test in fix_memory_usage @mchernys