GraphAligner fails with large number of threads

TanyaDvorkina commented 1 year ago

Hi Mikko,

Thank you for the GraphAligner tool!

I run GraphAligner (bioconda 1.0.16-) with 32 threads and it fails with error

Backtrace from 10 highest scoring local maxima per cluster write alignments to /Poppy/tdvorkina/nanoLJA/chrX_idealHiFi_LJA/k5001/nanosim_chm13_hg002_ont_simulated_35_aligned_reads_dbg_32.gaf Align Signal 11. Read: chrX_133969436;aligned_8_R_86_103548_3. Seed: 0+,0,0,0 Command terminated by signal 6

After that I run GraphAligner on the same input with 8 threads and it works well (some reads fail, but the whole run looks good).

Could you please help?

Input files can be found here Additional options that I used: -x dbg -t 8 --discard-cigar

Thank you, Tatiana

clemgoub commented 1 year ago

Hello there,

I would like to add to this thread my own experience. Indeed, I've seen some variability both in results and success rate using GraphAligner with multiple threads.

I started my runs with -t 40 for ~30 ONT datasets.

Most alignements went well, though I noticed small variations in the .gam file length and alignment states. Example below:

run 1:

GraphAligner bioconda 1.0.13-
GraphAligner bioconda 1.0.13-
Load graph from index/index.vg
Build minimizer seeder from the graph
Minimizer seeds, length 15, window size 20, density 10
Seed cluster size 1
Alignment bandwidth 10
Clip alignment ends with identity < 66%
X-drop DP score cutoff 14705
write alignments to LUN-007.gam
Align
Alignment finished
Input reads: 1842184 (11569963743bp)
Seeds found: 25755096326
Seeds extended: 46666185 <===
Reads with a seed: 1842184 (11569963743bp)
Reads with an alignment: 1842125 (11186592967bp)
Alignments: 20474168 (40481406758bp) (26192125 additional alignments discarded) <===
End-to-end alignments: 9029 (15274439bp) <===

run 2:

GraphAligner bioconda 1.0.13-
GraphAligner bioconda 1.0.13-
Load graph from index/index.vg
Build minimizer seeder from the graph
Minimizer seeds, length 15, window size 20, density 10
Seed cluster size 1
Alignment bandwidth 10
Clip alignment ends with identity < 66%
X-drop DP score cutoff 14705
write alignments to LUN-007.gam
Align
Alignment finished
Input reads: 1842184 (11569963743bp)
Seeds found: 25755096326
Seeds extended: 46666024 <===
Reads with a seed: 1842184 (11569963743bp)
Reads with an alignment: 1842125 (11186477125bp)
Alignments: 20474562 (40482282695bp) (26191570 additional alignments discarded) <===
End-to-end alignments: 9029 (15250445bp) <===

Then I had a few (3) samples which systematically ended up with
```
Signal 11. Read: SRR9951107.105681 105681/1. Seed: 250734-,13199,15,823
.command.sh: line 2: 89337 Aborted                 (core dumped) GraphAligner -t 40 -x vg -g index/index.vg -f SRR9951107_1.fastq.gz_filtered.fastq.gz -a AKA-017.gam
```
Removing the offending read did not solve the problem, as it reappeared for another one. After visiting the issues on this page, I retried with -t 8 and it went thought seamlessly.

So there is definitely something weird there, though workarounds are possible. -t 8 seem to be safe for now!

Thanks for your input, and thank you for this great tool! We have very good benchmark performances nevertheless compared to other graph aligner!

Cheers,

Clément

clemgoub commented 1 year ago

Update: I also got the same error with 8 cpus, the reason was a lack of RAM allocated to the working node. Though the job didn't use more than ~5Gb at peak (as reported by my cluster), it must have requested > 20Gb at some point causing the seg-fault. When I asked 50Gb of RAM, the error went away, the job finished but still, the max RAM used reported was below 20Gb

clemgoub commented 1 month ago

Hi @maickrau,

I was wondering if you had any insight in the error mentioned in this thread. We have integrated GraphAligner in our pipeline GraffiTE, and more users (including us) are sometimes encountering this error.

Thanks a lot,

Clément

maickrau / GraphAligner

GraphAligner fails with large number of threads #68