microbiome / OMA

Orchestrating Microbiome Analysis
https://microbiome.github.io/OMA
86 stars 43 forks source link

Structure of OMA #352

Closed TuomasBorman closed 2 months ago

TuomasBorman commented 1 year ago

Consider structure of OMA. How much general information there should be about microbiome data science (not directly related to miaverse)? Should we structure the book differently? Currently it seems that the material is just put somewhere.

It could be beneficial if we have a book that gathers all relevant information and directs the reader to correct source material.

The structure could be:

  1. General information on miaverse
    • What is miaverse
    • How miaverse fits to microbiome data science workflow
  2. General information on microbiome research
    • What and why?
    • What are the samples?
    • How to design a study.
  3. How to analyze / measure DNA from samples
    • What sequencing approaches are available.
    • How to get abundance tables. (HuMANn workflow, DADA2)
  4. Downstream analytics
    • Basic topics
    • Advanced topics
  5. Exercises
  6. Other material
antagomir commented 12 months ago

Given the clearly limited resources, we will primarily need to focus on the novel content that we can provide.

We can provide links to relevant external resources but we should also avoid duplicating information that is already well provided elsewhere. Very good tutorials exist for Biobakery (HuMANn and other), DADA2, and external packages and should not be replicated in OMA merely to improve book structure. We can critically comment some of the external research and provide references.

Hence I would at least for the time being keep the focus strictly on presenting how to implement R/Bioc workflows. I agree that some background is useful and can be provided to support the documentation. This can be providing best practices with justifications.

TuomasBorman commented 12 months ago

Yes, the purpose is not to copy those resources but rather create a platform from where to find relevant resources in microbiome data science. The focus is still on downstream analytics but I think background information is important for several reasons:

  1. Supports the user. There are lots of information and it is hard to find the relevant info. (That is also my own experience from spring when I ran HuMANn worklfow). Even though there are good tutorials, how to find the "best" tools? OMA could be the main resource from where to find information.
  2. If people start to use OMA as a main resource, we could get more users.
  3. There are lots of information that for example our group has but they are not really written anywhere

I believe there is no that kind of resource that gathers the information which is novel and valuable in that sense, I believe. I agree that background information must not be too long.

TuomasBorman commented 12 months ago

Any thoughts? @microsud

microsud commented 12 months ago

Hi both, Thanks for including me in this. I largely agree with Leo on having first a workflow approach that introduces the entire miaverse framework for microbiome data science. Something we can learn from QIIME2 documentation is the use of specific tutorials that address some key aspects of data and analytics, including tips and learnings that we have gathered over the past decade.

A single living book to cover the vastness of this topic seems challenging 🤔 for example, covering how to design microbiome studies, maybe out of scope.

There are several reviews on these topics, and unless we are giving data driven demos to show, for example, how to deal with contamination, etc, we provide links to these reviews in the general suggeted reading.

antagomir commented 12 months ago

Exactly, this is my concern. These topics are extremely vast, diverse, and fast-evolving. We would then need at least a clear and plausible plan who will write those materials and sync them with the other contents, and how them could be kept up-to-date in addition to the code base, which may already in itself prove to be challenging.

At the minimum, I would try to stabilize and organize better the current packages, methods, and workflows before initiating substantial new extensions.

I do agree that the book could give pointers to good practices. Having another look at other tutorials (QIIME2, Mothur, etc.) can also be helpful for inspiration.

TuomasBorman commented 12 months ago

Good points. I agree that let's now focus on finishing this miaverse workflow part.

Let's discuss this extension later. Just to add, the intention was just to point to good practices and to give one example on how to do this analysis from the beginning to end. Not to write too much ourselves. Bilateral support from existing mapping tools (DADA2, humann...) might give synergy

However, this structure issue still exists because currently OMA is not too well organized

  1. General information on miaverse 1.1 What is miaverse etc

Basic topics 2 Data containers, and basic operations, data fetching etc 3 Importing the data, different options, MAE, databases 4 Exploration and QC 5 Data transformations, agglomeration, splitting etc 6 Community diversity 7 Community similarity 8 Community typing 9 DAA

Advanced topics 10 Multiomics 11 ML 12 Bayesian modeling 13 Networks 14 Simulating bioreactor etc

15 Exercises

16 Extra material

antagomir commented 11 months ago

This kind of structure seems good. We can discuss the details, and whether something should be added or removed.

For instance ML is part of many different chapters already, we might like to have more specifically named ones (like "Prediction" or "Classification"

antagomir commented 11 months ago

14 Bioreactors could be rather time series for instance

maartenciers commented 9 months ago

I would honestly love more in depth explanations on why certain things are done the way they are like when do you use rarefaction when not what are the alternatives, what transformations to use, why use absolute counts instead of relative abundance for diversity measures or DA analysis,... .

The microbiome field really lacks consensus and it's really frustrating to read different things about the same subjects all over the place. My point is that OMA could be a main source for people learning about microbiome analysis and I feel that here it is really lacking. Don't get me wrong I love the work you put into the book but it feels like a missed opportunity to understand and apply these concepts now. Currently I just apply the things you do in the book but I don't understand it completely and get doubts about everything I do just because every source says something different I even swapped tools plenty of times phyloseq, mia, ... just because there are so many but they all do something different without explaining why.

If you compare microbiome analysis to let's say bulk RNA-seq or scRNA-seq there atleast there are common consensus written down in nice bioconductor books teaching these principles alongside the frameworks used and its very clear but atleast you are certain in what you are doing is correct or makes some sense. I just don't get this here, sentences are to vague in terms on what to use and I also had trouble with functions being changed or removed but it's not changed in the book (transformAssay). I'm sorry but I'm really tired of reading all these sources and still not knowing what I should do and for a bioconductor book I expect in depth explanations as to why certain things are chosen over others and these basic concepts discussed in greater details. Thank you for creating this book but please continue to update it it would be very helpfull for the enire microbiome community. If you know other very good credible sources or books that explain downstream 16S amplicon analysis methods and how to perform them reliably please let me know.

antagomir commented 9 months ago

Thanks! I agree entirely.

Hopefully you have noticed that the book is beta version and we are both advancing it as fast as resources allow, and welcoming pull requests from the community (which is taking place already).

The more in-depth explanations will clearly be a valuable thing to add. It just takes time to set up the technical basis for an entirely new ecosystem and we believe in the release early, release often philosophy.

This is not yet an official Bioconductor book but we are looking fwd to make it such - perhaps this spring already. At the latest then the book version and Bioconductor version will be synchonized automatically and there should be no mismatch between book and functions. Also at the moment the older functions and all examples in OMA should work, they just throw a deprecation warning. If this is not the case for some specific function we would love to know and fix asap.

The resources have increased and we are expecting a boost in development this spring.

TuomasBorman commented 3 months ago

Hi,

now it is time to decide the structure of the book as it might be harder to make bigger changes after the book is not beta version anymore. Here is my suggestion.

The goal is that the structure is intuitive and it is easy to find information. Also there should not be overlapping information. I would be happy to discuss about the stucture. "# Introduction" means that the table of contents is folded from there. By clicking it, reader can see the chapters under it (Introduction and miaverse).

  1. Motivation
    • Motivate the book
    • Explain what reader finds from the book
    • Explain how OMA is related to OSCA Goal:
      • Reader should know what information the book includes.
      • The idea of the book. --> "best practices", examples on miaverse, collaborative work

'# Introduction

  1. Introduction
    • Bioconductor (could be more information that now)
      • Explain briefly the idea of Bioconductor
      • Data containers, SE
        • SE is common.
        • Why we use data containers?
    • TreeSE, phyloseq, microbiome data science in Bioconductor
      • Why we use TreeSE. What is phyloseq / why not use it? Goal:
      • User should know that Bioconductor is large ecosystem for bioinformatics
        • Highlights the quality of miaverse
      • miaverse is Bioconductor's microbiome framework
      • miaverse is related to other fields by sharing the data container
      • How miaverse is related to phyloseq
        • phyloseq is "old", TreeSE is new
      • Motivation why miaverse was developed.
        • SE was not supported in microbiome field.
  2. miaverse
    • Tool ecosystem for microbiome downstream analysis.
    • Packages
      • Explain that there are many packages
      • mia* packages
      • External packages (some using SCE), expanding ecosystem (could be more info than now)
    • Data containers
      • TreeSE
      • SingleCellExperiment
      • SummarizedExperiment
      • MultiAssayExperiment
    • Installation of packages (we could have this in own chapter since people sometimes have problems with this (easier to find), however, there is not too much to say about this.) Goal:
      • After this, user should have an idea of the miaverse ecosystem. It is not just about mia and TreeSE.
      • We share methods from other fields.
      • The basics of SummarizedExperiment ecosystem (without code examples)
      • How SE-family data containers are related to each other.

'# Data containers and importing

  1. Data containers
    • Structure of TreeSE
    • MAE Goal:
      • User should know that data containers consists of slots, rows, and columns
      • User should know how to subset the data, and how to access slots.
      • Code examples that are not included in chapter 2.2
  2. Importing data
    • Explain that the data must be abundance data
    • External files
    • Importers
    • Converters
    • Data resources Goal:
      • User should know how to import own data
      • User should know that there are multiple curated data resources available

'# Data manipulations

  1. Common operations
    • Subsetting
    • Merging and melting
  2. Taxonomy related methods
    • How to set TAXONOMY_RANKS
    • How to calculate hieararchy tree...
  3. Transformation
    • Explain the idea of transformation
    • Show how transformations are applied
  4. Agglomeration
    • Motivation of agglomerating
    • Show how agglomeration is done by ranks, groups, prevalence
    • Subsetting functions, e.g., subsetByPrevalent

'# Exploration

  1. Exploration and quality control
    • Exploration is the first step in the workflow
    • What to look for?
    • How it can be done?
  2. Composition
    • Composition plots show basic composition of samples

'# Diversity and dissimilarity

  1. Diversity
    • Alpha diversity
    • What it measures?
  2. Dissimilarity
    • Dissimilarity and ordination
    • Explain the idea
  3. Clustering
    • What is clustering
    • How it can be applied? (Link also to agglomeration chapter, we can agglomerate based on clusters)

'# Associations

  1. Differential abundance
    • Explain the idea
    • Different DA methods
  2. Correlation
    • Correlation between assay and colData
    • Correlation between colData variables

'# Networks

  1. Network learning and analysis
  2. Network comparison

'# Multiomics

  1. Experiment cross-association
    • Introduction to multi-omics with very basic example
    • Correlation between experiments
  2. Ordination based multiomics methods
    • Basically, just intoduce MOFA
    • Currently examples do not have interpretation. That is something that we could add (also might be good idea in whole book to focus more on interpretations since they might be the hardest part).
  3. Multiomics integration
  4. Meta analyses
  5. Microbe set enrichment analysis

'# Machine learning (and statistical modeling?)

  1. Machine learning
    • Example in context of microbiome
    • Link to chapter 11 (ordination) and 12 (clustering).
    • Show supervised example
  2. Statistical modeling?

'# Advanced visualization

  1. visualization
    • Advanced examples that were not included in other chapters (because they were too complex).

' Workflows

  1. Workflow 1
  2. Workflow 2
  3. Workflow 3

' Training

  1. Training 29 . Exercises

' Appendices

  1. Resources and support (?)
  2. Extra material
    • Material that are important to have but that do not fit in the book because they are off-topic
  3. Contributions
  4. Session info
TuomasBorman commented 3 months ago

@antagomir

antagomir commented 3 months ago

Overall, looks good to me.

We might like to consider adjustments once we see this proposal implemented but those could be expected to be small.

Exact titles we might like consider in some cases, they should be generally short and not too technical.

It is really helpful to provide background but let us try to refer to external sources for more information also.

I would not put much emphasis on phyloseq (or why not phyloseq); should be enough to mention briefly and rather focus on the advantages of SE. In general, let's write in a positive tone, like instead of why not use something we discuss why to use some other thing; or e.g. instead of "SE was not implemented for microbiome research", we "saw the great opportunity to take advantage of SE framework in microbiome research" etc.

Overall, the plan is good and I can comment more when the structure is taking shape.