Out of memory error - Githubissues

I am trying to create a burst database on refseq genomes (134gb fa file). The fa file is linearized and formatted with accession id in the header and sequence next line.

>NZ_CP053296.1
GTGTCACTTTCGCTTTGGCA....

I ran this command: ./burst15 -r refseq.fa -o refseq.02082020.edx -a refseq.02082020.acx -d DNA 320 -i 0.95 -t 64 -s 1500

This is the output:

This is BURST [v1.0 DB 15]
 --> Using accelerator file refseq.02082020.acx
 --> Creating DNA database (assuming max query length 320)
 --> Setting identity threshold to 0.950000
 --> Setting threads to 64
 --> Shearing references longer than 1500
Using up to AVX-128 with 64 threads.

Parsed 25947 references.

Initiating database shearing procedure [shear 1500, ov 336].
Using compressive optimization (1 partitions; ~25947 refs).
[0] First pass: populated bin counters [782.536257]
--> Out of the 143567861906 original places, 141063566310 are eligible.
OOM:Ptrs_X

I assume OOM:ptrs_x is an out of memory error.

I am running database creation on a 500gb memory server with 64 cpus. The memory logs shows that a maximum of 135gb of memory was used with ~360gb memory free.

What could be the problem here? Is burst pre-calculating how much memory it would need and killing the process before it allocates it because 360gb free memory will not be enough?

Hi Damian,

Happy to see you're using BURST! BURST requires roughly 10x the size of the input fasta file in RAM when running in DNA mode normally. (It "bursts" from about 1X the size of the FA to ~9X right after it reads and parses it, so you won't see the RAM gradually increase -- it's an all or nothing allocation and it checks before allocating as you see here).

You can reduce this requirement by adding more database "partitions" using "-dp 5" or something like that for 5 partitions. Or you could use QUICK instead of "DNA" mode. This should not be a problem if your sequences are reasonably distinct from each other (i.e. species or subspecies "representative" genomes).

That said, BURST will still require at a minimum 5x the size of the input fa to run alignments, so that would exceed your memory capacity anyway. Best is to split the database by microbial family and run the queries through each of them, then merge the resulting .b6 files (just concatenate them).

Cheerio, Gabe

On Mon, Feb 8, 2021 at 11:06 PM Damian Kao notifications@github.com wrote:

I am trying to create a burst database on refseq genomes (134gb fa file). The fa file is linearized and formatted with accession id in the header and sequence next line.

NZ_CP053296.1 GTGTCACTTTCGCTTTGGCA....

I ran this command: ./burst15 -r refseq.fa -o refseq.02082020.edx -a refseq.02082020.acx -d DNA 320 -i 0.95 -t 64 -s 1500

This is the output:

This is BURST [v1.0 DB 15] --> Using accelerator file refseq.02082020.acx --> Creating DNA database (assuming max query length 320) --> Setting identity threshold to 0.950000 --> Setting threads to 64 --> Shearing references longer than 1500 Using up to AVX-128 with 64 threads.

Parsed 25947 references.

Initiating database shearing procedure [shear 1500, ov 336]. Using compressive optimization (1 partitions; ~25947 refs). [0] First pass: populated bin counters [782.536257] --> Out of the 143567861906 original places, 141063566310 are eligible. OOM:Ptrs_X

I assume OOM:ptrs_x is an out of memory error.

I am running database creation on a 500gb memory server with 64 cpus. The memory logs shows that a maximum of 135gb of memory was used with ~360gb memory free.

What could be the problem here? Is burst pre-calculating how much memory it would need and killing the process before it allocates it because 360gb free memory will not be enough?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/knights-lab/BURST/issues/33, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5NOBVRZXCMUB5QQ4BOWILS6CYGDANCNFSM4XKHMKJQ .

knights-lab / BURST

Out of memory error #33