sourmash-bio / sourmash_plugin_branchwater

fast, multithreaded sourmash operations: search, compare, and gather.
GNU Affero General Public License v3.0
14 stars 2 forks source link

fastmultigather gets killed because of memory usage #273

Open AnneliektH opened 4 months ago

AnneliektH commented 4 months ago

Trying to run a multifastgather of genbank-viral-database against viral sequences (~56.000 fasta sequences) When using 1 fasta sequence it does work and does not run out of memory, but I wanted it to run against the whole zip file of fastas. Tried with up to 250GB of mem, but keeps being OOM killed.

running this in: /group/ctbrowngrp2/scratch/annie/2023-swine-sra/sourmash/viral_taxonomy/genbank

command:

/usr/bin/time -v sourmash scripts fastmultigather \
> vOTUs.k21.s100.zip \
> genbank.2023-05.viral.dna-k21-sc100.rocksdb \
> -c 4 -k 21 -t 300 -s 100 -o votus.x.genbank.csv

output:

== This is sourmash version 4.8.6. ==
== Please cite Brown and Irber (2016), doi:10.21105/joss.00027. ==

=> sourmash_plugin_branchwater 0.9.1; cite Irber et al., doi: 10.1101/2022.11.02.514947

ksize: 21 / scaled: 100 / moltype: DNA / threshold bp: 300.0
gathering all sketches in 'vOTUs.k21.s100.zip' against 'genbank.2023-05.viral.dna-k21-sc100.rocksdb' using
 4 threads
Loaded DB
Reading query(s) from: 'vOTUs.k21.s100.zip'
Loaded 56816 query signature(s)
Command terminated by signal 9
        Command being timed: "sourmash scripts fastmultigather vOTUs.k21.s100.zip genbank.2023-05.viral.dn
a-k21-sc100.rocksdb -c 4 -k 21 -t 300 -s 100 -o votus.x.genbank.csv"
        User time (seconds): 1917.82
        System time (seconds): 79.46
        Percent of CPU this job got: 388%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 8:34.61
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 52269600
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 77016
        Minor (reclaiming a frame) page faults: 13193732
        Voluntary context switches: 79240
        Involuntary context switches: 57217
        Swaps: 0
        File system inputs: 5196344
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0
ctb commented 4 months ago

@mr-eyes reported something similar in #268. I wonder if maybe we are loading all the queries into memory?