Closed ulrichmarkus closed 8 months ago
Hi Markus,
thanks for the report and also the proposed solution! I agree, after some more testing that seems like a good addition to make the pipeline more robust.
Best, Daniel
Thanks for the report again, the change is implemented into the dev version and will be in the next release.
Description of feature
Hi @d4straub,
it is maybe a problem in 0.001% of the cases but could save some resources on the user side. Especially when sampleIDs are not standardized, it could happen that the standard sort() algorithm from R is mixing up sampleIDs which is causing errors at a later point when it comes to mergePairs().
One example I had was a pair of sampleID that looked like this: [1] S1982 and [2] S1982_2. During the pipeline the sampleIDs are used as filenames and extends those with _1/_2 for R1/R2. What is now happening is that the standard sort algorithm values an underscore differently than a point and sorts the second sample before the first for the R2 reads. Thus the minor improvement for dada2_denoising.nf could be to add a sort
method = "radix
to this step. Radix sort seems to have other internal values for underscore and points than the standard sort().Standard sort()
Radix sort()
Best, Markus