mikemc / speedyseq

Speedy versions of phyloseq functions
https://mikemc.github.io/speedyseq/
Other
46 stars 6 forks source link

Clustering ASVs to OTUs #78

Open ConstanceBtd opened 6 months ago

ConstanceBtd commented 6 months ago

Hello, I'm currently trying to cluster fungal ASVs into OTUs using DECIPHER and speedyseq, using the code example you have online:

cluster_matrix <- DistanceMatrix(refseq(physeq_asv),type = "matrix", includeTerminalGaps = T, processors=1)
clusters<- DECIPHER::TreeLine(myDistMatrix=cluster_matrix,method="single", cutoff=0.03, processors=NULL)

physeq_otu <- speedyseq::merge_taxa_vec(
  physeq_asv, 
  group = clusters$cluster,
  tax_adjust = 2
)

with physeq_asv my phyloseq object obtained with the dada2 pipeline. I updated your code changing the IdCluster function to the TreeLine one, however, the cluster output does not have a cluster column. It is a dendrogram. Do you kow how I could adapt this code to the new TreeLine output ?

Thanks !

mikemc commented 6 months ago

Hi @ConstanceBtd --- I haven't used the TreeLine function so I don't know enough about its output to say how to generate the appropriate group vector. Looking at the tutorial, it sounds like it only gives a tree and not a set of clusters. In that case, you'll have to determine clusters from the tree. If it's an ultrametric tree, you might be able to do so using cutree() in base R, though this will only work if the output of TreeLine is compatible with what cutree() expects. But whether this approach actually makes sense for your application, I can't say.

Probably you are better off asking your question in the DECIPHER GitHub, since it isn't specific to speedyseq: The question seems to be, what is the best way to compute clusters from the TreeLine output for merging OTUs.

Edit: Another option would be to add the tree to your phyloseq object's tree slot, then use tip_glom() (https://mikemc.github.io/speedyseq/reference/tip_glom.html)