rrwick / Unicycler

hybrid assembly pipeline for bacterial genomes
GNU General Public License v3.0
555 stars 131 forks source link

Lost a plasmid in Illumina-only assembly #138

Open swuyts opened 6 years ago

swuyts commented 6 years ago

Hi there,

I've got a few bacterial assemblies in which I determined some plasmid sequences. I therefore wanted to try out Unicycler to figure out whether it would be able to optimize my Illumina-only assembly, and possible close some of the plasmids.

However, after analysis, I did not detect my genes of interest anymore in the Unicycler assembly. I've figured that Unicyler completely removed them after step 003_bridges_applied.fga as I can't see it anymore in my assembly graph of 004_final_clean.gfa.

Did you ever into something similar like this? If yes, any suggestions on how to handle this?

Kind regards, Sander

Update:

I tried running Unicycler with

                --min_component_size 0 
                --min_dead_end_size 0

But no succes

aldertzomer commented 6 years ago

Similar issue, but with pacbio/Illumina data. One or two ~10 kb plasmids with a high copy number are lost. Both plasmids are still present in 002_overlaps_removed.gfa , the plasmid with the highest copy number is not present in 003_long_read_assembly.gfa , both are present in 004_bridges_applied.gfa, after that, they are lost. The plasmids both have some sequence that is also present on the chromosome (1-2 kb), is there some redundant contig filtering going on that removes them?

aldertzomer commented 6 years ago

I managed to get my plasmids (marked as circular as well) with the following changes:

It also results in three additional contigs of 113 bp, 266 bp and 3019 bp, all attached to the chromosome in the graph. the 3 kb fragment codes for ribosomal RNA, could be expected.

I'm not sure which change did the trick, but please try.

valery-shap commented 3 years ago

Hello, I have the same issue with hybrid Illumina/nanopore assembly. There is the gene (that is located on plasmid) in 001_best_spades_graph.gfa and in 002_overlaps_removed.gfa. And it is lost in 003_long_read_assembly.gfa. I've tried --depth_filter 0 and it doesn't help. If I use only long assembly, it is present in the last assembly.gfa Unicycler v0.4.9 Valery