ruanjue / smartdenovo

Ultra-fast de novo assembler using long noisy reads
GNU General Public License v3.0
129 stars 29 forks source link

cns output to fasta format #9

Closed bioteksampath closed 6 years ago

bioteksampath commented 6 years ago

Hi I got the .cns output using the smartdenovo assembly of nanopore reads. I want to convert them into FASTA format for the QUAST assessment. How do I convert the .cns to FASTA format?

Furthermore, It will be a great help if you could let me know which program should be good to use for contiguity and sequence identity (error rate of nanopore assembly) assessment against reference genome.

Thanks sam

ruanjue commented 6 years ago

The cns file is already in fasta format.

I tend to use MUMMER to assess genome assembly against reference sequences, it seem works well at 1% error rate. For higher error rate of 10%~20%, Heng Li have a program to plot them in minimap package. minimap2 can map assembly against reference genome very fast, similar program kbm in my wtdbg-1.2.8 package can map too.

bioteksampath commented 6 years ago

Thanks for you response... But .cns files have other non-ATGC sequences. You can see an example down here. It has some information ( [Sun Oct 22 18:29:02 2017]"utg2 length=114478 nodes=133" init) before each sequence.

[Sun Oct 22 18:29:02 2017]"utg2 length=114478 nodes=133" init [Sun Oct 22 18:29:02 2017]"utg2 length=114478 nodes=133" generate backbone length=114571 [Sun Oct 22 18:29:05 2017]"utg2 length=114478 nodes=133" align aln_score=1945263 [Sun Oct 22 18:29:05 2017]"utg2 length=114478 nodes=133" merge nodes [Sun Oct 22 18:29:05 2017]"utg2 length=114478 nodes=133" iter1 length=116486 aln_score=1945263 cns_score=510950.343750 [Sun Oct 22 18:29:06 2017]"utg2 length=114478 nodes=133" align aln_score=2104706 [Sun Oct 22 18:29:06 2017]"utg2 length=114478 nodes=133" merge nodes [Sun Oct 22 18:29:06 2017]"utg2 length=114478 nodes=133" iter2 length=116439 aln_score=2104706 cns_score=505014.687500 [Sun Oct 22 18:29:08 2017]"utg2 length=114478 nodes=133" align aln_score=2107670 [Sun Oct 22 18:29:08 2017]"utg2 length=114478 nodes=133" merge nodes [Sun Oct 22 18:29:08 2017]"utg2 length=114478 nodes=133" iter3 length=116449 aln_score=2107670 cns_score=505343.093750 [Sun Oct 22 18:29:11 2017]"utg2 length=114478 nodes=133" align aln_score=2108690 [Sun Oct 22 18:29:11 2017]"utg2 length=114478 nodes=133" merge nodes [Sun Oct 22 18:29:11 2017]"utg2 length=114478 nodes=133" iter4 length=116502 aln_score=2108690 cns_score=505370.625000 [Sun Oct 22 18:29:13 2017]"utg2 length=114478 nodes=133" align aln_score=2109056 [Sun Oct 22 18:29:13 2017]"utg2 length=114478 nodes=133" merge nodes [Sun Oct 22 18:29:13 2017]"utg2 length=114478 nodes=133" iter5 length=116477 aln_score=2109056 cns_score=505466.750000 [Sun Oct 22 18:29:15 2017]"utg2 length=114478 nodes=133" align aln_score=2109109 [Sun Oct 22 18:29:15 2017]"utg2 length=114478 nodes=133" merge nodes [Sun Oct 22 18:29:15 2017]"utg2 length=114478 nodes=133" iter6 length=116513 aln_score=2109109 cns_score=505419.250000

utg2 length=114478 nodes=133 CTATTTTCGAAAATTTTTATAAAGGTTTAATTCTTTGTAGGACAGGAACAACAGCGTCTTGAAAAACTGCCCGTAGAATTTGAGAACTTAGGACCTAGGT GGATAGTCTAGCAAGCTAAGTCTTAGGCAAAATTTCCAGTCTTGTTTCGCCGTTCCTTGATTGTCCAGGCAGAACATGCAAACCGATCTAGTCTCGAAAA GCCGATGATGCATACGCGTTAAAGCATACTCATCCGGCTTCCTAAAAGAAAGAAAAAGTACACAAGCCTTGAAAGAATGGCTAACCCGTTTTCTTTAAGT

I tried by deleting line starting with "[S" using sed but i'm still getting the same error of non-ATGC sequences for my QUAST analysis. Thanks sam

ruanjue commented 6 years ago

wtcns print progress on STDERR, output contigs to -o or STDOUT. Did you use 2>&1 in your shell script?