Closed proteinosome closed 3 years ago
Circular contigs have names that end with 'c' in the gfa files (the second column). Just to clarify: there's two suffices, 'c' and 'l' (linear), one and only one will be present.
On linux maybe: cut -f2 asm.p_ctg.noseq.gfa | uniq | awk '$1 ~ /.c$/ {print $1} ' > circ.list
gets the list of circular contig names, cat asm.p_ctg.gfa | awk '$1=="S" && ($2 ~ /.c$/) {printf ">%s\n%s\n", $2, $3} ' | gzip -1 > circ.fa.gz
gets a fasta (with long lines though) of circular contigs.
Gotcha, thank you so much for the convenient command!
Hi, thanks for the great tools! I am currently testing the tools on a few datasets and was wondering if there's a straightforward way to extract the contigs that are circular? In Canu/Flye there's either header or a separate file indicating which contigs are circular, for example.
Thank you!