rrwick / Unicycler

hybrid assembly pipeline for bacterial genomes
GNU General Public License v3.0
535 stars 132 forks source link

small contigs with less than 1 fold of coverage #319

Open DorothyTamYiLing opened 1 year ago

DorothyTamYiLing commented 1 year ago

Hi Ryan,

Thanks for writing Unicycler, it is a very useful tool.

I ran a hybrid assembly using conservative mode (same result for normal mode) and I obtained 3 contigs, as follows:

1 length=4128945 depth=1.00x circular=true 2 length=1905 depth=0.43x 3 length=1595 depth=0.45x The expected genome size is 4.1M.

I blasted the two small contigs with the big one and there were present, although contig 3 showed a few mismatches:

qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore 2 1 100.00 1905 0 0 1 1905 1993753 1991849 0.0 3518 3 1 99.31 1590 11 0 6 1595 2457094. 2455505 0.0 2876

I would like to know if contig 1 is a good quality assembly for use, as well as why the two small contigs show less than 1 fold of coverage?

Thanks, Dorothy

lingvv commented 7 months ago

Hello, Dorothy!

I share the same question and concern. I've observed that as the contig size decreases, the depth increases. I'm uncertain whether an assembly with a depth of 1.00x is considered a satisfactory outcome. I'm hopeful that someone with knowledge in this area can provide an answer, as it would be greatly appreciated.

Here's the information on my contigs for hybrid assembly:

>1 length=2520275 depth=1.00x circular=true
>2 length=138051 depth=1.24x circular=true
>3 length=15903 depth=1.55x circular=true
>4 length=5174 depth=7.63x circular=true
>5 length=3191 depth=10.60x circular=true

Thank you!

BiosciCS commented 3 months ago

Maybe the short ones are caused by multiple copies of plasmids in the bacteria, since the five short contigs are all circular.