microbiome / OMA

Orchestrating Microbiome Analysis
https://microbiome.github.io/OMA
86 stars 42 forks source link

Use a different MAE data for more meaningful MOFA results #321

Open artur-sannikov opened 1 year ago

artur-sannikov commented 1 year ago

Is your feature request related to a problem? Please describe. At the moment, Hintikka data is used for basically all analysis in OMA book. However, I see a problem in the MOFA section because the model only finds one factor which explains the variability only in metabolomic data (see "Variance Explained per factor and assay" figures). So I have difficulties interpreting and discussing the results because it does not show much in my opinion.

In contrast, in the original MOFA+ paper, they found that the factors capture different pieces of information, for example the differences in methylation, classes of neurons, etc. The presence of these factors also allowed them to apply t-SNE to discover sub-populations of cell types. Well, in our case, we cannot do much of downstream analysis.

Describe the solution you'd like I see two solutions here:

  1. Use some other multi-omic data from a different resource, build a MAE object (or find already existing data in MAE format) and show how we can perform downstream analysis on MOFA factors;
  2. Add a MAE object directly to mia (and use that for MOFA and downstream analysis), which might be more complicated but at the same it should become easier to work in the future.

These two solutions can be implemented simultaneously, and I do not have any preference to either as long as the data provides us with meaningful and interpretable results.

Additional context

  1. MAE package has a built-in multi-assay experiment miniACC as an example;
  2. If we use some other data, it'll break the flow of the analysis which at the moment uses CCA to uncover some interesting relationships and then MOFA to confirm and expand the previous findings;
  3. The dataset, of course, should be related to microbiome, although most of available multi-omic datasets come from cancer research (i.e., RNA-seq, methylation, mutations, etc.)
antagomir commented 1 year ago

We can certainly add another MAE demo data set in mia, for instance. It should be about microbiome research (which is indeed so far less covered in terms of multiomics methodology than cancer studies).

Or we can use existing data set. The possible sources:

  1. borenstein-lab/microbiome-metabolome-curated-data/; does not support TreeSE/MAE as such so that would require additional work/code.
  2. curatedMetagenomicData; provides a list of TreeSEs for cases where multiomics is available but the MAE support is still under consideration, so our own code should convert the experiment list into MAE
  3. EBI MGnify API through MGnifyR pkg; I think this already provides outputs readily in MAE format. This is a central data resource for European microbiome research and open data sharing, I think that would be quite good source if a suitable data set can be identified.

I expect that more informative factors can be identified from data sets with larger sample sizes.