voutcn / megahit

Ultra-fast and memory-efficient (meta-)genome assembler
http://www.ncbi.nlm.nih.gov/pubmed/25609793
GNU General Public License v3.0
585 stars 134 forks source link

Will circular DNA be reported? #353

Open ZeweiSong opened 1 year ago

ZeweiSong commented 1 year ago

Hi, we have found in our assembled results some contigs have identical first and last 141 bp, which is the longest kmer length we use. Can this kind of contigs be a real circular strucuture or simply an artifact?

For example, this is one of those contigs, the first and last 141 bps in bold:

ctg@k141_1642 AGCCGGCCCTGGGTCCGGTTGATCAGCACGGTGGTGTTCTGAAGCGACAGGAGGCTGTCGCGCACGGCAGCCGAAAGGGTGATCTCAGCCATTGCTGTACCTCGTTTCTAGGGTGCCCTCAACCGGGGCAATCGCTGCACGTGGAGGGATTCTCCCTCCGCGGCGGACGGTCATGAAGGCAGAGGTAAGCGGTGGGCAGCCGGTGCGTAAATTATCCACACGCATAAGGCCGCCGCGGCGCCCCCGCCGGCCAAGGAAAGGAGGGCGCCGGGCCCCCCTTTCCCGCCTAGGGCGATACCAGCCTAGCGGAAGAGCGCCAGGATCCCCTGCTCCGCCTGGCCGGCGAACGACAGGGCCTGGACGCCGAGCTGCTGGCGGGTCTGCAGGGCCAGCAGGTTGGCGCCTTCCTCGTTGAGGTCGGCCAGGGTCAGCTTGTCGGCGCCCGTCTGCAGGGTGTTCACGTAGCTCTCGGTGAACTCGAGACGGGTCTGCAGCAGCGCCACGTTGGAACCGAGGGTCTTGGTCTTGGTGCGCAAGGTGTCCAGGGCCGATTGAAGGTCGGTCACCAGGCCGTTGATTTCGGTGGTGAGCGTGGCGTCGATGTAGTACTGCCCGACAACGTCGCCAACGGTGATGCTGATGCCGGTGGTGATGCCCCCGAATTCCGCCTCACCGTCGCTGATTTTGCTGGCGACCTCGATGGTGAAGCTGAACGTTGTGCCCACGGCAATGACCTGGGTGAGGGCGAACGTAGTCGCCGTGGCCCCGCCGCCCACCGTGCCCACGCTGACGGTCACGGTATGGGTGCCGTAGAGGAACGTGAAATCGCCCGCGGCGCTGAAGGTGAAGGCGGTCGCGCCGGCATAGGTGATAGTGAGCGTCGAGCCGGTGGAGATGCTCACGCCGGTGGCGAAGTCATAGGACACGTTGAATGCTACATCACCGGTCAGCGCCACCGCCGTCAACGCGGTGCGGATGGCAAGGCCGGCGCCGCCTTCGGTCACGTCCACCGAATTGACGACCAGGGTGGACGAGGTGTCGGTGCTGAACTCGACCGTCAGCTTCTCGCCGGTGCCGTTGATCAGGTTCAGGCCCTGAAAGTCGGTGTCCTTGGCCAGGTTGGCGATCTGGAGGCGCAGGTCGTTGAACTGGCTGACCAGCTCGGCGATGGTGCTCGACGTCGCCGACTTGGCCGACACCGCGAGGCCCTTCATCTGGCGGACGATGGCGTCGATCGCCTCGGTGCCGTCCAGGGCCGCCGTCAGGCTGCTGATGCCCTGGTCGATGCCCGCCTTCTTGTCATCGAAGTCCGACGCCCGGTCGGCAAGGGTCTTCGCCTCGAAGAACTTGATCGGGTCGTCGATGGCGCTGGCCACCGACAGCCCGGTGGCCAGCCGGCCCTGGGTCCGGTTGATCAGCACGGTGGTGTTCTGAAGCGACAGGAGGCTGTCGCGCACGGCAGCCGAAAGGGTGATCTCAGCCATTGCTGTACCTCGTTTCTAGGGTGCCCTCAACCGGGGCAATCGCTGCACG

Thanks!

alienzj commented 1 year ago

image

alienzj commented 1 year ago

image

alienzj commented 1 year ago

It looks like this 141 bp sequence can not be identified from SRA microbe and fungi datasets.