mahulchak / quickmerge

A simple and fast metassembler and assembly gap filler designed for long molecule based assemblies.
GNU General Public License v3.0
192 stars 31 forks source link

'std::out_of_range' error #27

Closed zfuller5280 closed 4 years ago

zfuller5280 commented 6 years ago

Hi Mahul, I am trying to use quickmerge but am receiving the following error:

terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::substr: __pos (which is 18446744073709536653) > this->size() (which is 342465)

Looking at some of the other issues, I've seen this error come up a few other times. However, it looked the culprit was fasta files with whitespaces in the header names, or sequences not on one line. I do not believe this to be the issue in this case, as I first started with the merge_wrapper.py script. I run the command as follows:

merge_wrapper.py ../scaff10x_rounds2/renamed.sspace_scaff10x.2.fasta ../canu_assembly/asm/AM.contigs.fasta

I can see that it correctly creates the files hybrid_oneline.fa and self_oneline.fa in my current working directory. If I look at the first few headers in each file:

cat hybrid_oneline.fa|grep ">"|head -n 5
>1
>2
>3
>4
>5
cat self_oneline.fa|grep ">"|head -n 5
>tig00000004_len=34946_reads=29_covStat=35.85_gappedBases=no_class=contig_suggestRepeat=no_suggestCircular=no
>tig00000005_len=26830_reads=11_covStat=22.77_gappedBases=no_class=contig_suggestRepeat=no_suggestCircular=no
>tig00000007_len=146883_reads=146_covStat=247.16_gappedBases=no_class=contig_suggestRepeat=no_suggestCircular=no
>tig00000009_len=142320_reads=139_covStat=238.60_gappedBases=no_class=contig_suggestRepeat=no_suggestCircular=no
>tig00000013_len=39096_reads=25_covStat=60.84_gappedBases=no_class=contig_suggestRepeat=no_suggestCircular=no

Everything looks correct. I have also tried cutting a lot of the unnecessary text in the headers for self_oneline.fa, leaving the headers as ">tigXXXX" in a file called renamed_self.fa. If I try running the quickmerge command, following the order of arguments as on the wiki

quickmerge -d out.rq.delta -q hybrid_oneline.fa -r renamed_self.fa -hco 5 -c 1.5 -l 200000 -ml 5000

I still get the same error. What else, if anything, besides wrongly formatted fasta files could be throwing this error? Thanks for any information/insight and I hope to get this working!

mahulchak commented 6 years ago

Hi Zach, Will you be able to share your fasta files? I can try to reproduce your error. Mahul

On Mon, Apr 9, 2018, 07:17 Zach Fuller notifications@github.com wrote:

Hi Mahul, I am trying to use quickmerge but am receiving the following error:

terminate called after throwing an instance of 'std::out_of_range' what(): basic_string::substr: __pos (which is 18446744073709536653) > this->size() (which is 342465)

Looking at some of the other issues, I've seen this error come up a few other times. However, it looked the culprit was fasta files with whitespaces in the header names, or sequences not on one line. I do not believe this to be the issue in this case, as I first started with the merge_wrapper.py script. I run the command as follows:

merge_wrapper.py ../scaff10x_rounds2/renamed.sspace_scaff10x.2.fasta ../canu_assembly/asm/AM.contigs.fasta

I can see that it correctly creates the files hybrid_oneline.fa and self_oneline.fa in my current working directory. If I look at the first few headers in each file:

cat hybrid_oneline.fa|grep ">"|head -n 5

1 2 3 4 5

cat self_oneline.fa|grep ">"|head -n 5

tig00000004_len=34946_reads=29_covStat=35.85_gappedBases=no_class=contig_suggestRepeat=no_suggestCircular=no tig00000005_len=26830_reads=11_covStat=22.77_gappedBases=no_class=contig_suggestRepeat=no_suggestCircular=no tig00000007_len=146883_reads=146_covStat=247.16_gappedBases=no_class=contig_suggestRepeat=no_suggestCircular=no tig00000009_len=142320_reads=139_covStat=238.60_gappedBases=no_class=contig_suggestRepeat=no_suggestCircular=no tig00000013_len=39096_reads=25_covStat=60.84_gappedBases=no_class=contig_suggestRepeat=no_suggestCircular=no

Everything looks correct. I have also tried cutting a lot of the unnecessary text in the headers for self_oneline.fa, leaving the headers as ">tigXXXX" in a file called renamed_self.fa. If I try running the quickmerge command, following the order of arguments as on the wiki

quickmerge -d out.rq.delta -q hybrid_oneline.fa -r renamed_self.fa -hco 5 -c 1.5 -l 200000 -ml 5000

I still get the same error. What else, if anything, besides wrongly formatted fasta files could be throwing this error? Thanks for any information/insight and I hope to get this working!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mahulchak/quickmerge/issues/27, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMD6Iq-xF7YjUAMxXtwYJI4yXLtsAqBks5tm2r5gaJpZM4TMmm2 .

zfuller5280 commented 6 years ago

Thanks for the response. I just sent you links to the fasta files to your email. Let me know if you need me to share them another way.

s-yazar commented 6 years ago

Hi Mahul, I am getting the same error. Have you guys resolved this issue? Thanks, Seyhan

mahulchak commented 6 years ago

Hi Seyhan, Have you confirmed that your out_of_range error is not caused by fasta name issues? I am working on resolving the bug Zach reported. There are couple of other improvements that I am trying to incorporate so it is taking time. Mahul

s-yazar commented 6 years ago

Hi Mahul, I checked the fasta files. There is no whitespace in the header names and sequences are on one line. Seyhan

s-yazar commented 6 years ago

Hi Mahul, Just wanted to let you know that I resolved this issue and ran quickmerger multiple times after reinstalling MuMmer. Cheers, Seyhan

mahulchak commented 6 years ago

Thanks for the update.

On Thu, Jun 28, 2018 at 1:39 AM, Seyhan Yazar notifications@github.com wrote:

Hi Mahul, Just wanted to let you know that I resolved this issue and ran quickmerger multiple times after reinstalling MuMmer. Cheers, Seyhan

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mahulchak/quickmerge/issues/27#issuecomment-400958446, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMD6J9Co6Qmz-kVi5VRPiYVQpmGLDeaks5uBJYlgaJpZM4TMmm2 .

-- Mahul Chakraborty Emerson lab http://emersonlab.org/ Department of Ecology and Evolutionary Biology University of California-Irvine Phone: 949 824 9559 Fax: Github: https://github.com/mahulchak

zhk2017 commented 5 years ago

The problem is int floating point overflow, If the merged genome is over 2147483647 (2.15Gb) will report this error. Because int type range is ( -2147483648 to 2147483647). From BerryGenomics.

mahulchak commented 5 years ago

Are you talking about nucmer failure? MUMmer3 has that issue unless it is compiled with 64 bit flag. I don't think qm has that issue. Also, one feature I have been trying/experimenting with is to use

-l 10000

For delta-filter. It has been helpful. It basically filters out small alignments which are unreliable for merging.

On Sun, Sep 9, 2018, 18:44 zhk2017 notifications@github.com wrote:

It's a BUG.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/mahulchak/quickmerge/issues/27#issuecomment-419764075, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMD6L8mhMYJsOBfn1kYemO8YY2nAusVks5uZcPQgaJpZM4TMmm2 .

Neato-Nick commented 5 years ago

I had this error, and based on other advice in this thread, changing my tags resolved the issue. My original fasta files had headers with commas, pipes, and spaces, and were sort of long. I shortened all my headers and replaced all characters with either alphanumeric or underscores. Then, the fasta cleaning step of quickmerge worked. I'm not sure what characters in my headers were causing the fasta cleaning script to choke, so the best bet is to take an axe to them beforehand.

esolares commented 5 years ago

Hi

Usually MUMmer chokes on the use of pipes and percentage signs in the fasta headers. It's possible it was chocking at the MUMmer stage. If it was at the quickmerge stage, we would need to look at why the headers weren't being properly escaped, but so far quickmerge has not had any issues with headers. Were you running the python wrapper. If so, what output files were created when it ran and quit.

Thank you,

Edwin

On Thu, Nov 1, 2018, 2:33 PM Nick Carleson <notifications@github.com wrote:

I had this error, and based on other advice in this thread, changing my tags resolved the issue. My original fasta files had headers with commas, pipes, and spaces, and were sort of long. I shortened all my headers and replaced all characters with either alphanumeric or underscores. Then, the fasta cleaning step of quickmerge worked. I'm not sure what characters in my headers were causing the fasta cleaning script to choke, so the best bet is to take an axe to them beforehand.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mahulchak/quickmerge/issues/27#issuecomment-435194445, or mute the thread https://github.com/notifications/unsubscribe-auth/AEI6vtK11ejs8y8DXIFQSr3F-_oDwVsvks5uq2hHgaJpZM4TMmm2 .

Neato-Nick commented 5 years ago

Hi, I actually ran each step individually rather than running the wrapper so that I could pin the issue down. nucmer and delta-filter both ran fine, it was quickmerge that had the issue.

I used the --clean-only flag hidden within the merge-wrapper script to get the FASTA files to put into the pipeline first. This is where I noticed my issue: when I had long tags with commas, my fasta was not being converted properly; the spaces were removed, but it was still on multiple lines. The multiple lines didn't cause any issues with MUMmer steps but, again, quickmerge errord out. The exact error message was:

terminate called after throwing an instance of 'std::out_of_range'
  what():  basic_string::substr: __pos (which is 495327) > this->size() (which is 0)

It created all the output files, including param_summary, anchorsummary, etc. But the merged.fasta was empty.

Edit: here's my workflow

nucmer -l 100 -prefix <out> <ref.fasta> <query.fasta>
delta-filter -r -q -l 10000 <out.delta> > <out.rq.delta>
quickmerge -d <out.rq.delta> -q <query.fasta> -r <ref.fasta> -hco 5.0 -c 1.5 -l 1250000 -ml 10000

Changing any of the numbers in the parameters resulted in the same error; it was definitely (I should say most likely...) something about the FASTA headers causing the issue.

esolares commented 5 years ago

Interesting. Could you send over some example headers along with their next line, it can just be 20-50nts. I would like to do some tests with them.

Thank you,

Edwin

On Thu, Nov 1, 2018, 3:08 PM Nick Carleson <notifications@github.com wrote:

Hi, I actually ran each step individually rather than running the wrapper so that I could pin the issue down. nucmer and delta-filter both ran fine, it was quickmerge that had the issue.

I used the --clean-only flag hidden within the merge-wrapper script to get the FASTA files to put into the pipeline first. This is where I noticed my issue: when I had long tags with commas, my fasta was not being converted properly; the spaces were removed, but it was still on multiple lines. The multiple lines didn't cause any issues with MUMmer steps but, again, quickmerge errord out. The exact error message was:

terminate called after throwing an instance of 'std::out_of_range' what(): basic_string::substr: __pos (which is 495327) > this->size() (which is 0)

It created all the output files, including param_summary, anchorsummary, etc. But the merged.fasta was empty.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mahulchak/quickmerge/issues/27#issuecomment-435203356, or mute the thread https://github.com/notifications/unsubscribe-auth/AEI6vokUBVNiwylVXPnzHGlH4HL4iRbFks5uq3BfgaJpZM4TMmm2 .

Neato-Nick commented 5 years ago

Sure! I'll send a few sequences from both ref and query files tomorrow. Where should I send them to?

esolares commented 5 years ago

Sounds good. Would you be able to put them on Google drive and send us a link? If not I can try to figure something out for ftp upload.

On Thu, Nov 1, 2018, 4:17 PM Nick Carleson <notifications@github.com wrote:

Sure! I'll send a few sequences from both ref and query files tomorrow. Where should I send them to?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mahulchak/quickmerge/issues/27#issuecomment-435218536, or mute the thread https://github.com/notifications/unsubscribe-auth/AEI6vook_TE9JG0LCOYd8cTX4pMdKODVks5uq4D5gaJpZM4TMmm2 .

Neato-Nick commented 5 years ago

Yes, what e-mail address(es) should I share that folder with?

esolares commented 5 years ago

solarese@uci.edu

Thank you

Edwin

On Thu, Nov 1, 2018, 4:25 PM Nick Carleson <notifications@github.com wrote:

Yes, what e-mail address(es) should I share that folder with?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/mahulchak/quickmerge/issues/27#issuecomment-435219977, or mute the thread https://github.com/notifications/unsubscribe-auth/AEI6vuPGx8TX0iyh2aKppDdpir14BK7_ks5uq4KwgaJpZM4TMmm2 .

jfass commented 5 years ago

I was seeing this same error:

ERROR: mummer and/or mgaps returned non-zero terminate called after throwing an instance of 'std::out_of_range' what(): basic_string::substr: __pos (which is 7773) > this->size() (which is 0)

... in the CONSTRUCTIONTIME step, but also in the FINISHING step. I was merging three supernova assemblies, and I finally noticed that there were duplicate contig names on disparate sequences, in the merged.fasta output of QuickMerge. This was happening, I think, because the two assemblies I merged first had overlapping sets of contig names. So, for example, in the merged output I'd get two different sequences, but both had the header line ">4729". Then, merging that merged assembly with the assembly was giving the above error. So it might be good if the merged output was guaranteed to have unique fasta headers (or if there were a check for unique headers on the combined input set).

I also saw some intermittent instances of the error ... but this could have been confined to cases when I ran in the same directory where a previous run had error'ed out, without deleting files. I'd have to test that a bit more to be sure.