Closed ctxchris closed 8 years ago
Hi Chris,
Thanks very much for the bug reports. I think they should all be taken care of now.
make_merger.sh was outdated. It has now been updated to simply call 'make' in the 'merger' directory
the MUMmer directory now contains the 'aux_bin' directory, so compilation should proceed smoothly.
merge_wrapper.py has also been updated. I removed two lines of debug code that printed the names of all input scaffolds, so there should be less unnecessary output to stdout. Apart from that, the quickmerge wrapper functions fine in my hands.
I was not able to replicate the error that you mentioned wherein the input fasta sequence is printed after the headers. Please try the new version, and send me your code if you still have this problem.
Did you mean to imply that the final output ("merged.fasta") was not created? If so, please send me your code so I can attempt to replicate it. The program seems to work fine on my end.
-Jim
Hi Jim,
thanks for the quick fix. Everything's working fine now. I was referring to the two debug lines that print the "oneline" version of the FASTAs, not the final output.
Chris
Update: Nucmer and delta-filter run succesfully and files "aln_summary.tsv" "summaryOut.txt" "anchor_summary.txt" were created. File "merged.fasta" however is empty and quickmerged crashed with: segfault at 0 ip 000000000040d203 sp 00007ffe50902880 error 4 in quickmerge[400000+24000]
Hi Chris, Thank for reporting the problem. Could you please check for me the following things:
Also, could you please check if any of your fasta files has a sequence named ">ctg7180000002162" ? (Typically pbcr/celera assembly fasta files have such sequence names) Basically, you can do this grep 2162 foo.fasta And see if anything shows up. I also forgot to ask, does quickmerge print anything(like a chain of seq names) on the stdout before it crashes?
One file had line breaks in the sequence, the other had whitespace in the headers. I fixed both and run nucmer, delta-ffilter and quickmerge again. But I still get a segfault. The contigs of both files are named as ctg_X and contig_X with X being the contig count. That's why "...2162..." appears several times in the headers. The respective contigs also appear in the summary files. I noticed that the last entry of "aln_summary.tsv" "summaryOut.txt" and "anchor_summary.txt" is always contig_999999. quickmerge seems to crash while writing anchor_summary.txt. When I count the unique number of contigs for the reference and the query, aln_summary.txt and summaryOut.txt contain more entries than anchor_summary.txt. "merged.fasta" is not being written. I don't see anything written to stdout.
That's interesting. Would you be able to share your fasta files? I can try to find the source of the issue. Thank you. Mahul
On Sun, Oct 25, 2015, 10:33 AM Christian Dreischer notifications@github.com wrote:
One file had line breaks in the sequence, the other had whitespace in the headers. I fixed both and run nucmer, delta-ffilter and quickmerge again. But I still get a segfault. The contigs of both files are named as ctg_X and contig_X with X being the contig count. That's why "...2162..." appears several times in the headers. The respective contigs also appear in the summary files. I noticed that the last entry of "aln_summary.tsv" "summaryOut.txt" and "anchor_summary.txt" is always contig_999999. quickmerge seems to crash while writing anchor_summary.txt. When I count the unique number of contigs for the reference and the query, aln_summary.txt and summaryOut.txt contain more entries than anchor_summary.txt. "merged.fasta" is not being written. I don't see anything written to stdout.
— Reply to this email directly or view it on GitHub https://github.com/mahulchak/quickmerge/issues/1#issuecomment-150946762.
Unfortunately I can't share the fasta files. I run quickmerge on a subset of the data and got something printed to stdout before the coredump:
ctg1049 ctg1049 1 ctg7580 -1
ctg106 ctg106 1 ctg7954 -1
ctg1067 ctg7102 1 ctg1067 -1
ctg1112 ctg468 1 ctg1112 1
ctg1114 ctg3334 1 ctg4 -1 ctg1114 1 ctg377 1 ctg13 1
ctg1116 ctg1669 1 ctg1116 -1
ctg1120 ctg81 1 ctg1120 -1 ctg2550 -1
ctg1123 ctg1506 1 ctg1123 1
ctg1126 ctg1126 1 ctg1984 1
ctg1132 ctg1132 1 ctg2047 -1 ctg5089 1
ctg1135 ctg513 1 ctg1135 -1
ctg1136 ctg1136 1 ctg3296 -1
ctg1137 ctg6991 1 ctg1137 1
ctg1138 ctg2751 1 ctg2236 1 ctg1138 1 ctg1820 -1
ctg1145 ctg1145 1 ctg663 1
ctg1147 ctg1147 1 ctg224 1
ctg1150 ctg6935 1 ctg1150 -1
These are the last couple of lines and the error message:
ctg97 ctg1653 1 ctg97 -1
ctg99 ctg4536 1 ctg99 -1
ctg992 ctg992 1 ctg7535 1
terminate called after throwing an instance of 'std::out_of_range'
what(): basic_string::substr
Chris
Hi Chris, Without the fasta files, I can suggest you only a couple of things. CURRENT ERROR
REGARDING THE PREVIOUS ERROR: i) How big is your genome and how much memory do you have in your machine? It is possible that your genome is too big for your memory.(ideally you'll need memory> 2*genome size) if you are on a Linux machine, you can use /usr/bin/time -v to know the peak memory usage by quickmerge. ii) Will you be able to recompile quickmerge with the -g flag so that you can run gdb? Once you do that, you can run quickmerge with the original dataset in gdb and then gdb will generate the debug info.
Hi,
some things I noticed while trying quickmerge:
make_merger.sh has wrong compilation instructions should be "g++ -Wall -o quickmerge quickmerge.cpp qmergelib.cpp -I." instead of "g++ -Wall work_in_prog_temp.cpp exp_testlib.cpp -o merger"
MUMmer compilation might fail, because fodler aux_bin isn't created.
Running the quickmerge wrapper just prints all the scaffolds and contigs to stdout. The headers are printed twice, then the sequence itself.
Chris