tobiasgf / lulu

r package for post-clustering curation of amplicon next generation sequencing data (metabarcoding)
GNU Lesser General Public License v3.0
63 stars 17 forks source link

Child below `minimum_match` similarity merged #10

Open ewmorr opened 3 years ago

ewmorr commented 3 years ago

Thank you for the nice algorithm.

I have a question regarding the minimum_match threshold. I have supplied LULU with a matchlist generated by vsearch at 84% sequence similarity, and then run LULU at multiple thresholds. For example,

lulu(
otutable = asv_tab,
matchlist = asv_matches,
minimum_match = 93
)

I am finding that at some thresholds children ASVs that are below minimum match are merged (e.g., at 93% minimum match I get a match of 91.5 merged; at 95% minimum match several children between 94-95% are merged).

Is there a simple explanation for this? I thought maybe rounding, but the 91.5% match at 93% minimum would seem to indicate that's not the issue. So far this has affected only very low frequency ASVs (in terms of sample count) so it's not a huge issue, but curious to know if this is intentional.

Thanks! Best, Eric