morispi / CONSENT

Scalable long read self-correction and assembly polishing with multiple sequence alignment
https://doi.org/10.1038/s41598-020-80757-5
GNU Affero General Public License v3.0
55 stars 5 forks source link

Segmentation fault during polishing step #20

Open hasindu2008 opened 4 years ago

hasindu2008 commented 4 years ago

Hi,

I have been recently attempting to polish a draft (human genome) constructed from a PromethION sample. However, at the polishing step, it returns a segmentation fault. Any suggestions on fixing this?

Command used:

CONSENT-polish  --contigs $DRAFT --reads $READS  --out $OUTPUT  --nproc 64 -m 50G

Stdout:

[Wed Sep 16 14:07:21 AEST 2020] Aligning the long reads to the contigs (minimap2)
[Wed Sep 16 16:28:59 AEST 2020] Sorting the overlaps
[Thu Sep 17 20:11:04 AEST 2020] Polishing the contigs

stderr:

[M::mm_idx_gen::84.924*1.86] collected minimizers
[M::mm_idx_gen::92.015*3.89] sorted minimizers
[M::main::92.015*3.89] loaded/built the index for 3855 target sequence(s)
[M::mm_mapopt_update::96.598*3.76] mid_occ = 667
[M::mm_idx_stat] kmer size: 15; skip: 5; is_hpc: 0; #seq: 3855
[M::mm_idx_stat::99.125*3.69] distinct minimizers: 165963083 (35.72% are singletons); average occurrences: 5.831; average spacing: 2.922
[M::worker_pipeline::174.006*15.81] mapped 87711 sequences
......
[M::worker_pipeline::8492.457*27.15] mapped 25672 sequences
[M::main] Version: 2.14-r883
[M::main] CMD: minimap2 --dual=yes -PD --no-long-join -w5 -g1000 -m30 -n1 -t64 -I50G assembly.fasta pass.fastq
[M::main] Real time: 8495.410 sec; CPU: 230553.598 sec; Peak RSS: 37.940 GB
CONSENT-polish: line 203: 39123 Segmentation fault      (core dumped) $LRSCf/bin/CONSENT-polishing -a $tmpdir/"$alignments" -s "$minSupport" -S "$maxSupport" -l "$windowSize" -k "$merSize" -c "$commonKMers" -A "$minAnchors" -f "$solid" -m "$windowOverlap" -j "$nproc" -r "$contigs" -R "$reads" -M "$maxMSA" -p "$LRSCf" >> "$out"
Command exited with non-zero status 139
morispi commented 4 years ago

Hi,

Do you know if CONSENT crashes right away when starting the polishing step?

Usual errors include using FASTQ reads when CONSENT only supports FASTA (but when polishing an assembly, I'm pretty sure your contigs are FASTA), or not using a "one sequence per line" formatted FASTA file. Can you check whether or not your input file is in "one sequence per line" format?

Best, Pierre

hasindu2008 commented 4 years ago

As I ran a batch job could not determine when exactly the crash occurs.

My reads are in FASTQ and the contigs are in FASTA. Should the reads also should be in FASTA? My contigs seem to be in multi-line FASTA and maybe that is the problem.

morispi commented 4 years ago

Yes, both reads and contigs should be in FASTA. Both read and contigs should also be in "one sequence per line" format. The problem seems to come from here then. Can you update me if you try again after converting everything to "one sequence per line" FASTA?

Best, Pierre

morispi commented 3 years ago

Hi,

I recently updated CONSENT and it now accepts both FASTA and FASTQ as input, and sequences are no long required to be in "one sequence per line" format. I believe this should fix your issue.

Leaving it open for now, but don't hesitate to update me if you encounter further errors.

Best, Pierre