Open SMorrison42 opened 1 year ago
Here's a specific example, output from assembly of Illumina NovaSeq reads from monkeypox virus. Note segment 12 has no sequence (length = 0 bp).
cat L*A13.assembly.gfa | grep ^S | cut -f1,2,4,5 | column -t
S 1 LN:i:151789 dp:f:1.0
S 2 LN:i:14767 dp:f:0.9675450839707439
S 3 LN:i:11567 dp:f:0.959489622964646
S 4 LN:i:5777 dp:f:0.9215721186123447
S 5 LN:i:4676 dp:f:1.9657676335501226
S 6 LN:i:1624 dp:f:2.023590772062553
S 7 LN:i:268 dp:f:2.1558008320217863
S 8 LN:i:24 dp:f:1.6570576416128175
S 9 LN:i:16 dp:f:20.642039801793622
S 10 LN:i:9 dp:f:27.790311595329598
S 11 LN:i:2 dp:f:0.6247140887830516
S 12 LN:i:0 dp:f:0.7382984685617883
Yet the graph includes linkages between 12 and other segments:
cat L*A13.assembly.gfa | grep ^L | grep 12
L 12 + 8 + 0M
L 12 + 11 + 0M
L 11 + 12 + 0M
L 12 - 8 + 0M
Hi, I'm using Unicycler in my pipeline and using the gfa python package to parse the .gfa file unicycler produces to assist with assessment of the final assembly. I've noticed in a few of my .gfa files it will report a contig length 0 with no sequence. What is the parameter I should use to remove contig length 0 from the gfa report?