tobiasgf / lulu

r package for post-clustering curation of amplicon next generation sequencing data (metabarcoding)
GNU Lesser General Public License v3.0
63 stars 17 forks source link

LULU settings to use it to simply cluster at a given threshold? #21

Open lplough opened 7 months ago

lplough commented 7 months ago

Hi There, Is it possible to use LULU simply as a tool to cluster sequences at a certain percent ID? In other words, could I turn off (or set to a certain value) all of the other settings besides the cutoff? e.g. with the basic LULU code:

curated_result <- lulu(otutab, matchlist, minimum_ratio_type = "min", minimum_ratio = 1, minimum_match = 84, minimum_relative_cooccurence = 0.95)

I would like to ignore or effectively turn off the Minimum_ratio_type option and the minimum_relative cooccurance so that only the minimum_match is set (e.g. to 97). I understand that this may not be the intended use of LULU, but It seems to me that with the right settings, one could effectively minimize what the other options are doing? Im just not sure what to set them at. By leaving them at default, some of the curation still occurs.

Thanks for any help.

LP

adrientaudiere commented 7 months ago

Hi @lplough,

lulu() needs a matchlist args that is not necessary for your purpose.

I would recommend a function built for this purpose only. The function DECIPHER::Clusterize() is made for this. If you work with physeq object from the phyloseq package, you can use the MiscMetabar::asv2otu() using either "Clusterize" or "vsearch" software (tutorial available here). I apologize for the not very intuitive name of this function, but it is doing only what you want, i.e reclustered sequences at a given threshold.