Closed BigNianNGS closed 4 years ago
Hi,
Q1: These datasets have very different coverages. Some bacteria in ZymoLog had coverage below 1x, thus impossible to assemble.
Q2: Not surprising at all for real metagenomes. Different technologies might be picking up different parts of the metagenome. Plus, Illumina assemblies typically output contigs with very low read coverage (like 1x or 2x), which metaFlye does not. It makes sense to set up a reasonable length and coverage cutoffs for contigs before making the comparison.
Hi, Q1: These datasets have very different coverages. Some bacteria in ZymoLog had coverage below 1x, thus impossible to assemble. Q2: Not surprising at all for real metagenomes. Different technologies might be picking up different parts of the metagenome. Plus, Illumina assemblies typically output contigs with very low read coverage (like 1x or 2x), which metaFlye does not. It makes sense to set up a reasonable length and coverage cutoffs for contigs before making the comparison.
Thank you for advice. Can I use 'concensus.fasta' instead of the final 'assembly.fasta' for downstream analysis ? The genomesize of 'assembly.fasta' turned to be much smaller than the 'concensus.fasta' , for example 2.3G vs 3.0G. And I have checked that most of the lost regions were not repeat. That is also strange ...
Another question is if I can chose the assemblies from NGS and the '10-consensus/concensus.fasta' from ONT (polished by racon and medaka, and even pilon) and combine them by using '--subassemblies' parameter. I think this will take advantages of both technologies for better results.
@BigNianNGS no, the intermediate sequences should not be used for the analysis, as they might contain duplications and misassemblies.
Hello,
Thank you for metaFlye assembly module development and I have two questions about that.
Q1: Why the difference of genome sizes for Zymo Log and Zymo Even is so big ?
I have tested metaflye for a very complex gut contents sample, which may be more like Zymo log. The genomesize generated from ONT sequences was 2.5Gbp by using 2kb min-overlap. And the genomesize from NGS sequences was 5Gbp by using megahit software (--presets meta-large). Both ONT and NGS were sequeced by using the same DNA and generate 60Gbp rawdata. Q2: Why the difference of genome sizes of ONT and NGS is so big for complex metagenomic sample? Can you give me some advice?
Thank you in advance !