xfengnefx / hifiasm-meta

hifiasm_meta - de novo metagenome assembler, based on hifiasm, a haplotype-resolved de novo assembler for PacBio Hifi reads.
MIT License
60 stars 8 forks source link

Extracting circular contigs #4

Closed proteinosome closed 3 years ago

proteinosome commented 3 years ago

Hi, thanks for the great tools! I am currently testing the tools on a few datasets and was wondering if there's a straightforward way to extract the contigs that are circular? In Canu/Flye there's either header or a separate file indicating which contigs are circular, for example.

Thank you!

xfengnefx commented 3 years ago

Circular contigs have names that end with 'c' in the gfa files (the second column). Just to clarify: there's two suffices, 'c' and 'l' (linear), one and only one will be present.

On linux maybe: cut -f2 asm.p_ctg.noseq.gfa | uniq | awk '$1 ~ /.c$/ {print $1} ' > circ.list gets the list of circular contig names, cat asm.p_ctg.gfa | awk '$1=="S" && ($2 ~ /.c$/) {printf ">%s\n%s\n", $2, $3} ' | gzip -1 > circ.fa.gz gets a fasta (with long lines though) of circular contigs.

proteinosome commented 3 years ago

Gotcha, thank you so much for the convenient command!