Closed SilasK closed 6 years ago
Error correction is optional and if errors are present in sequences it will affect their ability to map to decontaminant reference sequences. That was my thought anyways. SPAdes will perform an additional step of error correction in its protocol.
For publication, you normally want to upload raw reads. But at least for human metagenomics one should remove the reads from the host for confidentiality reasons. That was a second thought to put the decontamination step first and all the quality filtering and error correction later?
That definitely makes sense in that context.
We now force users to either start with single- or paired-end after de-interleaving their own reads using reformat.sh. It seems like a simple switch and I'm open to it.
The rule order has been updated to reflect corrections highlighted in the original issue.
I think error correction should be later in the pipeline e.g after the decontamination step. tedpole has a quite aggressive error correction, which is optimized for spades according to Brian, but this might not be the best solution for megahit and downstream analysis, e.g. SNP calling.
rRNA reads should be excluded from error correction, I think.