rrwick / Unicycler

hybrid assembly pipeline for bacterial genomes
GNU General Public License v3.0
547 stars 131 forks source link

Unicycler for reference-guided assembly? #210

Open cizydorczyk opened 4 years ago

cizydorczyk commented 4 years ago

I am interested in assembling some draft bacterial genomes using a complete genome as a reference to guide the assembly process. While I expect some minor differences, most of these should be at the SNP level from my draft genomes compared to the reference.

Would Unicycler be an appropriate tool for this? I was thinking I could run it in hybrid assembly mode, specifying my short Illumina reads & the reference genome as a "long read". My concern with this approach, however, is that Unicycler might use the reference to "correct" bases in the short reads or in the final assembly, when I have no reason to expect the reference base to be the "correct" base at a specific locus in my draft assembly; i.e. I do not want to the reference to determine what specific base occurs at a specific site, but rather as a structural guide for the assembly process.

Any response is greatly appreciated.

Thank you, Conrad

stevenjdunn commented 4 years ago

Unicycler isn't built for this use case.

If you're resequencing a reference genome, you could use Snippy to map the short reads back against your reference, and take the consensus.fa or consensus.subs.fa output. These files are your reference with all variants, and only SNPs written in respectively.

This would maintain the same structure as your reference. You'll also get a neat output of all variants (snps.tab), and if you pass Snippy a genbank file you'll get contextual information regarding where the variants occur (e.g. CDS, AA residue, effect).