malonge / RagTag

Tools for fast and flexible genome assembly scaffolding and improvement
MIT License
470 stars 47 forks source link

Some contigs are not scaffolded #148

Open rknx opened 1 year ago

rknx commented 1 year ago

I have a plasmid that I am trying to stitch together. Based on molecular work and homology, I'm fairly certain two contigs (NODE_001 and NODE_051) below are part of same plasmid. However, in the final results, NODE_051 does not seem to be scaffolded with NODE_001.

Command

ragtag.py scaffold -C --debug -t 4 -d 250000 --aligner nucmer -o ragtag/qry ref.fa qry.fa

Please find relevant parts of debug outputs that may be of use:

ragtag.scaffold.debug.query.info.txt

NODE_001    CP018729.1  1.0 1.0 1.0
NODE_051    CP018729.1  1.0 1.0 1.0
NODE_130    CP018729.1  1.0 1.0 1.0
NODE_141    CP018729.1  1.0 1.0 1.0
NODE_165    CP018729.1  1.0 1.0 1.0

ragtag.scaffold.debug.unfiltered.paf

NODE_001    165349  16993   165349  +   CP018729.1  211336  0   148354  148339  148356  0
NODE_001    165349  80  16993   +   CP018729.1  211336  194424  211336  16912   16913   0
NODE_051    40607   37844   39333   -   CP018729.1  211336  152188  153677  1446    1489    0
NODE_051    40607   72  32458   -   CP018729.1  211336  162034  194428  32308   32395   0
NODE_130    5747    0   3339    +   CP018729.1  211336  156050  159389  3339    3339    0
NODE_141    3861    0   3861    -   CP018729.1  211336  148692  152553  3861    3861    0
NODE_165    524 0   524 -   CP018729.1  211336  155185  155709  524 524 0

ragtag.scaffold.debug.filtered.paf

NODE_001    165349  16993   165349  +   CP018729.1  211336  0   148354  148339  148356  0
NODE_001    165349  80  16993   +   CP018729.1  211336  194424  211336  16912   16913   0
NODE_051    40607   37844   39333   -   CP018729.1  211336  152188  153677  1446    1489    0
NODE_051    40607   72  32458   -   CP018729.1  211336  162034  194428  32308   32395   0
NODE_130    5747    0   3339    +   CP018729.1  211336  156050  159389  3339    3339    0
NODE_141    3861    0   3861    -   CP018729.1  211336  148692  152553  3861    3861    0
NODE_165    524 0   524 -   CP018729.1  211336  155185  155709  524 524 0

ragtag.scaffold.asm.paf

NODE_001    165349  16993   165349  +   CP018729.1  211336  0   148354  148339  148356  0   NM:i:17 cg:Z:61867M1I59022M1I27465M
NODE_001    165349  80  16993   +   CP018729.1  211336  194424  211336  16912   16913   0   NM:i:1  cg:Z:15091M1I1821M
NODE_051    40607   37844   39333   -   CP018729.1  211336  152188  153677  1446    1489    0   NM:i:43 cg:Z:1489M
NODE_051    40607   72  32458   -   CP018729.1  211336  162034  194428  32308   32395   0   NM:i:87 cg:Z:27321M2D1M3D16M1D87M3D3189M1I1771M
NODE_130    5747    0   3339    +   CP018729.1  211336  156050  159389  3339    3339    0   NM:i:0  cg:Z:3339M
NODE_141    3861    0   3861    -   CP018729.1  211336  148692  152553  3861    3861    0   NM:i:0  cg:Z:3861M
NODE_165    524 0   524 -   CP018729.1  211336  155185  155709  524 524 0   NM:i:0  cg:Z:524M

But in the end, it appears that NODE_051 is not scaffolded together with NODE_001 in CP018729.1.

ragtag.scaffold.agp

CP018729.1_RagTag   1   165349  1   W   NODE_001    1   165349  +
Chr0_RagTag 50040   90646   43  W   NODE_051    1   40607   +
Chr0_RagTag 90647   90746   44  U   100 contig  no  na

rragtag.scaffold.stats

placed_sequences    placed_bp   unplaced_sequences  unplaced_bp gap_bp  gap_sequences
145 5285728 23  108456  16400   164

Why is NODE_051 not a part of the scaffold, and what parameters should I change to include it? Please let me know if you'd like the input files. Thanks.