rotary-genomics / rotary

Assembly/annotation workflow for Nanopore-based microbial genome data containing circular DNA elements
BSD 3-Clause "New" or "Revised" License
2 stars 1 forks source link

Metagenomics Mode (MetaFlye and Binning) #160

Open LeeBergstrand opened 5 months ago

LeeBergstrand commented 5 months ago

We have previously discussed adding metagenomics compatibility by running fly in meta mode and doing genome binning.

jmtsuji commented 5 months ago

@LeeBergstrand For now, I'd suggest to leave genome binning out of the MVP. I think the pipeline should already be compatible with metagenomes up until the end of the circularization step (i.e., before annotation), although I'd need to double check this just to make sure. This level of metagenome compatibility might be enough for the MVP -- users can use rotary to assembly metagenomes with properly closed circular contigs, and then they can handle genome binning themselves. Once the MVP is out, we could consider a meta-mode for rotary as an extension. How does this sound?

jmtsuji commented 5 months ago

P.S. The current config file already has a way to turn meta mode on or off for Flye, so that aspect is already addressed. Meta mode is sometimes helpful for genome assemblies (e.g., if you're not sure if the culture is pure... I wonder if it might also help with assembling differentially abundant plasmids).

LeeBergstrand commented 5 months ago

@LeeBergstrand For now, I'd suggest to leave genome binning out of the MVP. I think the pipeline should already be compatible with metagenomes up until the end of the circularization step (i.e., before annotation), although I'd need to double check this just to make sure. This level of metagenome compatibility might be enough for the MVP -- users can use rotary to assembly metagenomes with properly closed circular contigs, and then they can handle genome binning themselves. Once the MVP is out, we could consider a meta-mode for rotary as an extension. How does this sound?

@jmtsuji This sounds good to me. To me, it's a low priority at this time.

LeeBergstrand commented 5 months ago

Here are some things to think about down the road:

jmtsuji commented 5 months ago

@LeeBergstrand Good points. My guess is that existing genome binners (e.g., MetaBAT2) should work fine with Illumina, Nanopore, or hybrid data. MetaBAT2 just uses coverage info of the contigs (obtained from BAM files) and the contig sequences themselves to guide genome binning, in my understanding. So long as read mapping is accurate and the contigs are error-free, I think genome binning from a mix of different read types should be OK. It would be worthwhile to check this carefully later on, though.

LeeBergstrand commented 1 week ago

@jmtsuji This is becoming more and more of an issue for me. We are finding out that more and more of the genomes we are processing are actually co-cultures even though they are originally thought to be single strain.