We noticed for larger viral genomes a high --max-iteration helped generate more contiguous assembly. We were surprised to find a large number of SNPs that were not supported when we aligned the reads back to this assembly.
There does not seem to be a magic number for --max-iterations for some assemblies. Either it will not be in 1 piece, or it will have SNPs. Since these SNPs are not supported by reads, we could use a tool like pilon to correct the assembly, but it may be easy/better to fix within penguin to avoid this caveat to assembly accuracy.
See below for the rhinovirus assembly with default settings (few/no SNPs0) and --max-iterations 15 (many SNPs) also zoomed.
Expected Behavior
penguin assembly with few/no SNPs relative to the reads used to assemble.
Current Behavior
high --max-iterations results in SNPs that are not supported by the reads used during assembly
Steps to Reproduce (for bugs)
observable on most samples we tested with --max-iterations 15
also happens with the benchmark rhinovirus data here: https://github.com/AnnSeidel/penguin-analysis/tree/main/benchmarking/rhinovirus-3-mixture on some of the contigs, screenshots attached.
Context
We noticed for larger viral genomes a high --max-iteration helped generate more contiguous assembly. We were surprised to find a large number of SNPs that were not supported when we aligned the reads back to this assembly.
There does not seem to be a magic number for --max-iterations for some assemblies. Either it will not be in 1 piece, or it will have SNPs. Since these SNPs are not supported by reads, we could use a tool like pilon to correct the assembly, but it may be easy/better to fix within penguin to avoid this caveat to assembly accuracy.
See below for the rhinovirus assembly with default settings (few/no SNPs0) and --max-iterations 15 (many SNPs) also zoomed.