mahulchak / quickmerge

A simple and fast metassembler and assembly gap filler designed for long molecule based assemblies.
GNU General Public License v3.0
200 stars 31 forks source link

Possible reason for the following big scaffold got discarded #38

Open liu-xingliang opened 5 years ago

liu-xingliang commented 5 years ago

Hi @mahulchak ,

We used quickmerge to merge a PacBio + Dovetail + Bionano (pb+dt+bn) scaffolds and a ONT assembly,

I found that a 83290869bp scaffold got discarded in the quickmerge results. And this scaffold is actually NOT in the anchor summary output. I could not figure out the reason. I suspect the reason would be misassembly, but cannot find a prove. It will be very appriciated if you can take a look at the following info I attached and give me some suggestions.

Here I attached a Excel sheet of NUCmer alignment (converted from .delta format to .paf using Li Heng's paftools.js for readability) of this scaffold to quickmerge assembly.

Super-Scaffold_7308.xlsx

For the file header

header details
query_id Sequence id in our pb + dt + bn scaffolds
query_length pb + dt + bn scaffold length
query_start start of alignment on pb + dt + bn scaffold
query_end end of alignment on pb + dt + bn scaffold
relative_strandness relative strandness
quickmerge_id Sequence id in quickmerge result
quickmerge_length quickmerge sequence length
quickmerge_start start of alignment on quickmerge sequence
quickmerge_end end of alignment on quickmerge sequence

For quickmerge_id, there are two possibilities:

Thank you very much!

mahulchak commented 5 years ago

Hi Liu,

Could you please share the following files from your quickmerge run?

param_summary anchor_summary

In addition, please share the output quickmerge spits out to stdout. Once your share them, I will take a look. I am overburdened at this moment so my response might be delayed. In case, you are not comfortable sharing all those files here, you can email them to me directly. Best, Mahul

On Sun, Feb 24, 2019 at 6:48 PM LIU Xingliang notifications@github.com wrote:

Hi @mahulchak https://github.com/mahulchak ,

We used quickmerge to merge a PacBio + Dovetail + Bionano (pb+dt+bn) scaffolds and a ONT assembly,

I found that a 83290869bp scaffold got discarded in the quickmerge results. I could not figure out the reason. I suspect the reason would be misassembly, but cannot find a prove. It will be very appriciated if you can take a look at the following info I attached and give me some suggestions.

Here I attached a Excel sheet of NUCmer alignment (converted from .delta format to .paf using Li Heng's paftools.js for readability) of this scaffold to quickmerge assembly.

Super-Scaffold_7308.xlsx https://github.com/mahulchak/quickmerge/files/2898809/Super-Scaffold_7308.xlsx

For the file header header details query_id Sequence id in our pb + dt + bn scaffolds query_length pb + dt + bn scaffold length query_start start of alignment on pb + dt + bn scaffold query_end end of alignment on pb + dt + bn scaffold relative_strandness relative strandness quickmerge_id Sequence id in quickmerge result quickmerge_length quickmerge sequence length quickmerge_start start of alignment on quickmerge sequence quickmerge_end end of alignment on quickmerge sequence

For quickmerge_id, there are two possibilities:

  • Super-Scaffold_XXX/ScIPGVn_XXX_obj_pilon, pb+dt+bn scaffolds ids (used as query)
  • utgXXX_pilon, ONT assembly ids (used as reference)

Thank you very much!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mahulchak/quickmerge/issues/38, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMD6NisZEBB_6SYo0_sVX2hYVQVPA_dks5vQ07ggaJpZM4bO-vI .

-- Mahul Chakraborty Department of Ecology and Evolutionary Biology University of California-Irvine Phone: 949 824 9559 Fax: 949 824 9559 Github: https://github.com/mahulchak

liu-xingliang commented 5 years ago

Hi @mahulchak ,

Thank you very much for your prompt reply. Here are those files you asked:

param_summary_bn_fined_ont_quickmerge.txt anchor_summary_bn_fined_ont_quickmerge.txt

Here is the output to stdout:

1: PREPARING DATA
2,3: RUNNING mummer AND CREATING CLUSTERS
# reading input file "bn_fined_ont_quickmerge.ntref" of length 1868777346
# construct suffix tree for sequence of length 1868777346
# (maximum reference length is 2305843009213693948)
# (maximum query length is 18446744073709551615)
# process 18687773 characters per dot
#....................................................................................................
# CONSTRUCTIONTIME /biomedja01/disk1/software/liuxl18/quickmerge/quickmerge/MUMmer3.23/mummer bn_fined_ont_quickmerge.ntref 2008.77
# reading input file "/biomedja01/disk1/liuxl18/bionano/runBNG/child_hybrid_B2N2/bn_scaffolded.fined.fasta" of length 2845484543
# matching query-file "/biomedja01/disk1/liuxl18/bionano/runBNG/child_hybrid_B2N2/bn_scaffolded.fined.fasta"
# against subject-file "bn_fined_ont_quickmerge.ntref"
# COMPLETETIME /biomedja01/disk1/software/liuxl18/quickmerge/quickmerge/MUMmer3.23/mummer bn_fined_ont_quickmerge.ntref 7916.14
# SPACE /biomedja01/disk1/software/liuxl18/quickmerge/quickmerge/MUMmer3.23/mummer bn_fined_ont_quickmerge.ntref 4569.85
4: FINISHING DATA
0       /biomedja01/disk1/software/liuxl18/quickmerge/quickmerge/merger/quickmerge
1       -d
2       bn_fined_ont_quickmerge.rq.delta
3       -q
4       /biomedja01/disk1/liuxl18/bionano/runBNG/child_hybrid_B2N2/bn_scaffolded.fined.fasta
5       -r
6       /biomedja01/disk1/liuxl18/child_ONT_polishing/regular_and_long_reads/ge1000_asm/pilon/relax1.ge1000.miniasm.pilon.fasta
7       -hco
8       5.0
9       -c
10      1.5
11      -l
12      150000
13      -ml
14      5000
15      -p
16      bn_fined_ont_quickmerge
utg000301c_pilon         Super-Scaffold_351     1       utg000301c_pilon        1        Super-Scaffold_1467  1
utg000339l_pilon         Super-Scaffold_1106    1       utg000339l_pilon        1        Super-Scaffold_1931  1       utg004332l_pilon        -1
utg000495l_pilon         ScIPGVn_109;HRSCAF=1004_pilon_obj      1       utg000495l_pilon        -1
utg000672l_pilon        utg000672l_pilon        1        Super-Scaffold_2225    -1
utg001005l_pilon         ScIPGVn_212;HRSCAF=1200_pilon_obj      1       utg001005l_pilon        1
utg001086l_pilon        utg001086l_pilon        1        ScIPGVn_134;HRSCAF=1058_pilon_obj      -1
utg001190l_pilon         ScIPGVn_62;HRSCAF=889_pilon_obj        1       utg001190l_pilon        1
utg002094l_pilon         Super-Scaffold_6395    1       utg002094l_pilon        1        Super-Scaffold_7511  -1      utg005135l_pilon        -1
utg002470l_pilon        utg001568l_pilon        1        Super-Scaffold_13090   1       utg002470l_pilon      -1       Super-Scaffold_13090   1
utg002800l_pilon        utg002800l_pilon        1        Super-Scaffold_10859   -1      utg001920l_pilon      -1       Super-Scaffold_9743    1
utg002901l_pilon         ScIPGVn_27;HRSCAF=640_pilon_subseq_37366920:80034689   1       utg002901l_pilon      -1       ScIPGVn_27;HRSCAF=640_pilon_subseq_37366920:80034689   1
utg003780l_pilon         Super-Scaffold_4810    1       utg003780l_pilon        1
utg006949l_pilon         ScIPGVn_414;HRSCAF=1463_pilon_obj      1       utg006949l_pilon        1     Super-Scaffold_13391    -1
utg009835l_pilon        utg009835l_pilon        1        Super-Scaffold_3960    1       utg001987l_pilon      1        ScIPGVn_501;HRSCAF=1568_pilon_obj      1
utg011555l_pilon        utg011555l_pilon        1        ScIPGVn_708;HRSCAF=1805_pilon_obj      -1   utg002162l_pilon -1       Super-Scaffold_7308    -1
utg011679l_pilon        utg011679l_pilon        1        ScIPGVn_74;HRSCAF=918_pilon_obj        1
utg015010l_pilon        utg015010l_pilon        1        ScIPGVn_938;HRSCAF=2061_pilon_obj      -1
utg015418l_pilon        utg015418l_pilon        1        ScIPGVn_90;HRSCAF=960_pilon_obj        1
utg016626l_pilon        utg016626l_pilon        1        ScIPGVn_88;HRSCAF=957_pilon_obj        -1
Merged Contig and length: utg000301c_pilon      6087239
Merged Contig and length: utg000339l_pilon      897464
Merged Contig and length: utg000495l_pilon      370079
Merged Contig and length: utg000672l_pilon      608216
Merged Contig and length: utg001005l_pilon      312882
Merged Contig and length: utg001086l_pilon      683462
Merged Contig and length: utg001190l_pilon      234640
Merged Contig and length: utg002094l_pilon      11482584
Merged Contig and length: utg002470l_pilon      12110621
Merged Contig and length: utg002800l_pilon      1886528
Merged Contig and length: utg002901l_pilon      42667770
Merged Contig and length: utg003780l_pilon      211401
Merged Contig and length: utg006949l_pilon      1925310
Merged Contig and length: utg009835l_pilon      6889416
Merged Contig and length: utg011555l_pilon      53949524
Merged Contig and length: utg011679l_pilon      589163
Merged Contig and length: utg015010l_pilon      326358
Merged Contig and length: utg015418l_pilon      270610
Merged Contig and length: utg016626l_pilon      514028

Thank you very much for your help! Just take your time.

bless~ Xingliang

AlexWanghaoming commented 4 years ago

I met the same question when I tried to merge assemblies from Falcon and MECAT2

mahulchak commented 4 years ago

Could you explain a little more? Will you be able to share your files so that I can replicate your issue?

On Tue, Jun 16, 2020, 21:37 wanghm notifications@github.com wrote:

I met the same question when I tried to merge assemblies from Falcon and MECAT2

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mahulchak/quickmerge/issues/38#issuecomment-645144361, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZQH2A3GT3KSPWHANS6ZNDRXBB5LANCNFSM4GZ35PEA .