metagenome-atlas / atlas

ATLAS - Three commands to start analyzing your metagenome data
https://metagenome-atlas.github.io/
BSD 3-Clause "New" or "Revised" License
368 stars 97 forks source link

Error correction #31

Closed SilasK closed 6 years ago

SilasK commented 6 years ago

I think error correction should be later in the pipeline e.g after the decontamination step. tedpole has a quite aggressive error correction, which is optimized for spades according to Brian, but this might not be the best solution for megahit and downstream analysis, e.g. SNP calling.

rRNA reads should be excluded from error correction, I think.

brwnj commented 6 years ago

Error correction is optional and if errors are present in sequences it will affect their ability to map to decontaminant reference sequences. That was my thought anyways. SPAdes will perform an additional step of error correction in its protocol.

SilasK commented 6 years ago

For publication, you normally want to upload raw reads. But at least for human metagenomics one should remove the reads from the host for confidentiality reasons. That was a second thought to put the decontamination step first and all the quality filtering and error correction later?

brwnj commented 6 years ago

That definitely makes sense in that context.

We now force users to either start with single- or paired-end after de-interleaving their own reads using reformat.sh. It seems like a simple switch and I'm open to it.

brwnj commented 6 years ago

The rule order has been updated to reflect corrections highlighted in the original issue.