vpc-ccg / sedef

Identification of segmental duplications in the genome
MIT License
26 stars 8 forks source link

Error: launched 1000 jobs but completed only 998 jobs; exiting... Error: vector::_M_default_append #25

Closed erfrancoeur closed 2 months ago

erfrancoeur commented 3 months ago

I have been trying to run this on the mouse reference genome (mm10) and am consistently getting 2 jobs failing in the align stage. I am running SEDEF 1.1-35-g5acd139.

While I know BISER is the preferred tool for annotating SDs in genomes, we are interested in investigating how these SD annotations may differ in breakpoints between BISER and SEDEF.

I have seen that others have had this same issue (Align buckets failing with "Error: vector::_M_default_append" #17 and Error with number of jobs completed (zebrafish genome) #22). However, I believe I am using the most recent version and it seems to have been resolved in issue #17.

We have tried to run SEDEF on both mm10 and mm39, softmasked and unmasked, and keep getting this same error:

Start: Tue Jun 11 10:54:04 AM EDT 2024
SEDEF: FASTA=/projects/beck-lab/reference/mm10/mm10_canonical.fa; output=sedef_mm10; jobs=60; force=n
************************************************************************
Running SD seeding...
SD seeding done: done running 462 jobs!
Single-core running time: 10 hours (39461.7 seconds)
Memory used: 2700 MB
************************************************************************
Running SD alignment...
************************************************************************
Running SD alignment...
SD alignment done: finished 1000 jobs!
Error: launched 1000 jobs but completed only 998 jobs; exiting...
done
Tue Jun 11 01:06:33 PM EDT 2024

When checking the log files, I always see this error in 2 consecutive files:

🍁  🐚    SEDEF 1.1-35-g5acd139; arguments:  (SSE4.1) sedef align generate -k 11 /projects/beck-lab/reference/mm10/mm10.fa sedef_mm10/align/bucket_0010
Read 3180 alignments in sedef_mm10/align/bucket_0010
Read total 3180 alignments
Using k-mer size 11
 Processing 3180 out of 3180 (100.0%, len 43,931,015 to 34,211,335)Error: vector::_M_default_append
Command exited with non-zero status 1
TIMING: 1170.84 5886324

🍁  🐚    SEDEF 1.1-35-g5acd139; arguments:  (SSE4.1) sedef align generate -k 11 /projects/beck-lab/reference/mm10/mm10.fa sedef_mm10/align/bucket_0011
Read 3180 alignments in sedef_mm10/align/bucket_0011
Read total 3180 alignments
Using k-mer size 11
 Processing 3180 out of 3180 (100.0%, len 34,312,746 to 44,768,046)Error: vector::_M_default_append
Command exited with non-zero status 1
TIMING: 1139.76 58578292

Any help would be greatly appreciated.

erfrancoeur commented 2 months ago

"That error means that SEDEF finds a region with too many seeds and crashes due to memory overload. There is no automatic way to handle that issue; before BISER, we would manually remove those regions from the bucket (typically last 2-3 items in bucket BED) and either discard them or align them separately with different parameters. BISER should be able to handle those cases."

Issue was resolved by removing chromosome Y prior to running