mahulchak / quickmerge

A simple and fast metassembler and assembly gap filler designed for long molecule based assemblies.
GNU General Public License v3.0
192 stars 31 forks source link

quickmerge: terminate called after throwing an instance of 'std::out_of_range' #58

Closed phrh closed 3 years ago

phrh commented 3 years ago

Hello, I am trying to merge two assemblies, but I get an error. Do you have an idea what can it be? o how can i understand the error?

quickmerge -d merge.canu.flye2.rq.delta -q canu/VdC07.all.fastq.unitigs.racon.fasta -r flye/VdC07.all.fastq.unitigs.flye.racon.fasta -hco 5.0 -c 1.5 -l 3000000 -ml 5000 -p m2.canu.flye 0 quickmerge 1 -d 2 merge.canu.flye2.rq.delta 3 -q 4 canu/VdC07.all.fastq.unitigs.racon.fasta 5 -r 6 flye/VdC07.all.fastq.unitigs.flye.racon.fasta 7 -hco 8 5.0 9 -c 10 1.5 11 -l 12 3000000 13 -ml 14 5000 15 -p 16 m2.canu.flye contig_2 contig_2 1 tig00000015 1
contig_64 tig00000017 1 contig_64 1
contig_74 tig00000040 1 contig_74 -1 tig00000004 1
terminate called after throwing an instance of 'std::out_of_range' what(): basic_string::substr: __pos (which is 2616023) > this->size() (which is 0) Aborted (core dumped)

mahulchak commented 3 years ago

Could you check if the fasta headers in your assembly fasta have white space? the sequence names in the mummer delta file should match the names in the fasta files. mummer removes all characters from the sequence names that follows the first white space.

On Tue, Oct 6, 2020 at 5:57 AM phrh notifications@github.com wrote:

Hello, I am trying to merge two assemblies, but I get an error. Do you have an idea what can it be? o how can i understand the error?

quickmerge -d merge.canu.flye2.rq.delta -q canu/VdC07.all.fastq.unitigs.racon.fasta -r flye/VdC07.all.fastq.unitigs.flye.racon.fasta -hco 5.0 -c 1.5 -l 3000000 -ml 5000 -p m2.canu.flye 0 quickmerge 1 -d 2 merge.canu.flye2.rq.delta 3 -q 4 canu/VdC07.all.fastq.unitigs.racon.fasta 5 -r 6 flye/VdC07.all.fastq.unitigs.flye.racon.fasta 7 -hco 8 5.0 9 -c 10 1.5 11 -l 12 3000000 13 -ml 14 5000 15 -p 16 m2.canu.flye contig_2 contig_2 1 tig00000015 1 contig_64 tig00000017 1 contig_64 1 contig_74 tig00000040 1 contig_74 -1 tig00000004 1 terminate called after throwing an instance of 'std::out_of_range' what(): basic_string::substr: __pos (which is 2616023) > this->size() (which is 0) Aborted (core dumped)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mahulchak/quickmerge/issues/58, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZQH2EJCH2CGN4ZW2OEOXDSJMHXJANCNFSM4SF7SHSQ .

-- Mahul Chakraborty Department of Ecology and Evolutionary Biology University of California-Irvine Phone: 949 824 9559 Fax: 949 824 9559 Website: https://mahulchakraborty.wordpress.com/ Github: https://github.com/mahulchak

phrh commented 3 years ago

I am trying to merge two assemblies one produced by flye and the other one by canu. I don't see white spaces but names with the following format. The characters that follow white space are length, and other information

contig_XX tig000000YY

mahulchak commented 3 years ago

can you give me examples of the fasta header? e.g. you can try grep '>' my_assembly.fasta|head -n 1 and paste the output here. replace my_assmbly.fasta by your flye or canu assembly names.

On Tue, Oct 6, 2020 at 11:55 AM phrh notifications@github.com wrote:

I am trying to merge two assemblies one produced by flye and the other one by canu. I don't see white spaces but names with the following format. The characters that follow white space are length, and other information

contig_XX tig000000YY

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mahulchak/quickmerge/issues/58#issuecomment-704484178, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZQH2DDHGSV4AMKXJ5DXDTSJNRV7ANCNFSM4SF7SHSQ .

-- Mahul Chakraborty Department of Ecology and Evolutionary Biology University of California-Irvine Phone: 949 824 9559 Fax: 949 824 9559 Website: https://mahulchakraborty.wordpress.com/ Github: https://github.com/mahulchak

phrh commented 3 years ago

Yes

flye

contig_1 LN:i:4926155 RC:i:145983 XC:f:1.000000 contig_10 LN:i:513 RC:i:6 XC:f:0.500000 contig_101 LN:i:1519 RC:i:8 XC:f:0.750000 contig_104 LN:i:2047 RC:i:14 XC:f:0.800000 contig_113 LN:i:4131 RC:i:50 XC:f:1.000000

Canu

tig00000001 LN:i:60425 RC:i:854 XC:f:1.000000 tig00000002 LN:i:87954 RC:i:1213 XC:f:1.000000 tig00000003 LN:i:45052 RC:i:908 XC:f:0.989011 tig00000004 LN:i:3654163 RC:i:85689 XC:f:0.999863 tig00000005 LN:i:67237 RC:i:362 XC:f:1.000000

mahulchak commented 3 years ago

Ah. So there are white space in the fasta headers. one solution would be to remove all texts after the sequence names. e.g. in this line tig00000001 LN:i:60425 RC:i:854 XC:f:1.000000 only keep tig00000001 and discard the rest. If that is not easy for you to do, you can replace the white space by underscore () character. e.g. run this for both of your assemblies (there is space between s/ and /): sed 's/ /_/g' assembly.fasta > assembly_no_white.fasta

Then run nucmer again and then run quickmerge.

On Wed, Oct 7, 2020 at 4:18 AM phrh notifications@github.com wrote:

Yes

flye

contig_1 LN:i:4926155 RC:i:145983 XC:f:1.000000 contig_10 LN:i:513 RC:i:6 XC:f:0.500000 contig_101 LN:i:1519 RC:i:8 XC:f:0.750000 contig_104 LN:i:2047 RC:i:14 XC:f:0.800000 contig_113 LN:i:4131 RC:i:50 XC:f:1.000000

Canu

tig00000001 LN:i:60425 RC:i:854 XC:f:1.000000 tig00000002 LN:i:87954 RC:i:1213 XC:f:1.000000 tig00000003 LN:i:45052 RC:i:908 XC:f:0.989011 tig00000004 LN:i:3654163 RC:i:85689 XC:f:0.999863 tig00000005 LN:i:67237 RC:i:362 XC:f:1.000000

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mahulchak/quickmerge/issues/58#issuecomment-704867099, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZQH2DBY7UF4A5GJQFHTILSJRE3PANCNFSM4SF7SHSQ .

-- Mahul Chakraborty Department of Ecology and Evolutionary Biology University of California-Irvine Phone: 949 824 9559 Fax: 949 824 9559 Website: https://mahulchakraborty.wordpress.com/ Github: https://github.com/mahulchak

phrh commented 3 years ago

It did work, thank you. I am sorry I though those parameters were not considered in the name.