marbl / canu

A single molecule sequence assembler for genomes large and small.
http://canu.readthedocs.io/
644 stars 177 forks source link

Correction step parameters #2288

Closed tianjio closed 4 months ago

tianjio commented 4 months ago

Hi, In the correction step, I want to adjust the values of rawErrorRate and correctedErrorRate parameters to select the best result. May I ask which operation results can be universal and will not be affected by these two parameters? For example, can the output of 0-mercounts and the result of 1-overlapper be used directly for the different values of these two parameters?

skoren commented 4 months ago

Hi, there are multiple instances on 0-mercounts and 1-overlapper for each of the steps. It's run for correction, trimming and unitizing. The 0-mercounts can be reused in all cases. The correctedErrorRate is only used after the correction step so it cannot reuse 1-overlapper in trimming/unitigging if you changed it. It might be OK to reuse the trimming results if you increase the correctedErrorRate (e.g. you trimmed at 0.03 or 3% but assembled at 0.05 or 5%). The rawErrorRate cannot reuse any 1-overlapper since it'd change the corrected reads which would change the subsequent steps too. There's information on this on the Canu pipeline docs: https://canu.readthedocs.io/en/latest/tutorial.html#error-rates