microbiome / OMA

Orchestrating Microbiome Analysis
https://microbiome.github.io/OMA
84 stars 43 forks source link

Add bluster examples to clustering #265

Closed BananaCancer closed 1 year ago

BananaCancer commented 1 year ago

Add new examples using bluster to the clustering part. So far I've been thinking of adding most of the examples of the bluster vignette : hierarchical clustering, k-means, affinity propagation, SOM, DBSCAN, Graph-based and two-phase.

Since the first two algorithms already exist in OMA, I don't know if they should be replaced by bluster or if the bluster part should be appended to the existing part.

antagomir commented 1 year ago

I would not replicate all functionality that is available through bluster. Instead, the idea would be to showcase how to do clustering using bluster, just one method would be sufficient for that and users could then check bluster for more ideas.

I suggest the following:

  1. Clustering samples

a. One example with bluster::clusterRows. Hierarchical clustering would be fine, unless it seems that one of the other methods would be better justified. This is mainly to showcase what tools are available (=bluster). And/or the PAM clustering.

b. One example with scran::clusterCells to get experience on what might be the pros & cons.

c. One analogous example using the DMM method, to showcase how to do same thing with another clustering package.

  1. Clustering taxa

a. transform data with CLR, perform hierarchical clustering, add cluster indices to rowData, use mergeRows to collapse the rows into clusters.

b. mention that dedicated methods for detecting co-occurring taxa are available, such as SPIEC-EASI (add reference)

antagomir commented 1 year ago

After this we have more concrete examples and feeling on how this works as such, and further planning can be done.

antagomir commented 1 year ago

Also, read the discussion in PR#187 for some background.

TuomasBorman commented 1 year ago

Yep

How about adding similar figure that you showed? PCoA with ellipses showing clusters and colors showing taxa? Is that common?

This could also include explanation what we can do with clustering features (reduce number of multiple testing and maybe reference to some study that has done this kind of thing?) --> this would add "real life" example

antagomir commented 1 year ago

I referred to Alneberg et al. (2014). That case is a bit different as that clusters contigs based on sample coverage and DNA composition features, in order to identify species.

Analogous thing with respect to usual taxonomic abundance matrices would be to cluster taxa into higher-level groupings (ellipsoids) and then indicate some corresponding real grouping (known sub-communities?) with colors. Probably clustering samples (ellipsoids) and then using colors to highlight some known sample groups (e.g. different age groups often differ w.r.t. gut microbiota). But bluster on the other hand doesn't support such ellipsoids afaik, we should think if we wish to add that feature. It might be useful and I think we had some other OMA examples showing how to add ellipsoids in scatterplots.

I agree for a clear use case. One option is the peerj13075 example data set that is readily available in mia. The samples cluster pretty well w.r.t. geographical location.

BananaCancer commented 1 year ago

I've done most of the cluster of samples and taxa and I changed the dataset to peerj13075. I've left the previous clustering content as I'm not sure what should be done with it. I could move the DMM method to be closer to bluster if needed. My main problem is with "2.b. mention that dedicated methods for detecting co-occurring taxa are available, such as SPIEC-EASI". I've started to look into it but I'm unsure of what you expect from this part.

antagomir commented 1 year ago

Let us work through this PR without the SPIEC-EASI part. We can discuss and add that afterwards.

BananaCancer commented 1 year ago

Here is the pull request #266 , I mostly need to know what to do with the previous content and I'm open to any suggestions.

antagomir commented 1 year ago

I was referring to discussion in mia / #187.

antagomir commented 1 year ago

This is now complete. Thanks!