Closed TuomasBorman closed 2 months ago
Given the clearly limited resources, we will primarily need to focus on the novel content that we can provide.
We can provide links to relevant external resources but we should also avoid duplicating information that is already well provided elsewhere. Very good tutorials exist for Biobakery (HuMANn and other), DADA2, and external packages and should not be replicated in OMA merely to improve book structure. We can critically comment some of the external research and provide references.
Hence I would at least for the time being keep the focus strictly on presenting how to implement R/Bioc workflows. I agree that some background is useful and can be provided to support the documentation. This can be providing best practices with justifications.
Yes, the purpose is not to copy those resources but rather create a platform from where to find relevant resources in microbiome data science. The focus is still on downstream analytics but I think background information is important for several reasons:
I believe there is no that kind of resource that gathers the information which is novel and valuable in that sense, I believe. I agree that background information must not be too long.
Any thoughts? @microsud
Hi both, Thanks for including me in this. I largely agree with Leo on having first a workflow approach that introduces the entire miaverse framework for microbiome data science. Something we can learn from QIIME2 documentation is the use of specific tutorials that address some key aspects of data and analytics, including tips and learnings that we have gathered over the past decade.
A single living book to cover the vastness of this topic seems challenging 🤔 for example, covering how to design microbiome studies, maybe out of scope.
There are several reviews on these topics, and unless we are giving data driven demos to show, for example, how to deal with contamination, etc, we provide links to these reviews in the general suggeted reading.
Exactly, this is my concern. These topics are extremely vast, diverse, and fast-evolving. We would then need at least a clear and plausible plan who will write those materials and sync them with the other contents, and how them could be kept up-to-date in addition to the code base, which may already in itself prove to be challenging.
At the minimum, I would try to stabilize and organize better the current packages, methods, and workflows before initiating substantial new extensions.
I do agree that the book could give pointers to good practices. Having another look at other tutorials (QIIME2, Mothur, etc.) can also be helpful for inspiration.
Good points. I agree that let's now focus on finishing this miaverse workflow part.
Let's discuss this extension later. Just to add, the intention was just to point to good practices and to give one example on how to do this analysis from the beginning to end. Not to write too much ourselves. Bilateral support from existing mapping tools (DADA2, humann...) might give synergy
However, this structure issue still exists because currently OMA is not too well organized
Basic topics 2 Data containers, and basic operations, data fetching etc 3 Importing the data, different options, MAE, databases 4 Exploration and QC 5 Data transformations, agglomeration, splitting etc 6 Community diversity 7 Community similarity 8 Community typing 9 DAA
Advanced topics 10 Multiomics 11 ML 12 Bayesian modeling 13 Networks 14 Simulating bioreactor etc
15 Exercises
16 Extra material
This kind of structure seems good. We can discuss the details, and whether something should be added or removed.
For instance ML is part of many different chapters already, we might like to have more specifically named ones (like "Prediction" or "Classification"
14 Bioreactors could be rather time series for instance
I would honestly love more in depth explanations on why certain things are done the way they are like when do you use rarefaction when not what are the alternatives, what transformations to use, why use absolute counts instead of relative abundance for diversity measures or DA analysis,... .
The microbiome field really lacks consensus and it's really frustrating to read different things about the same subjects all over the place. My point is that OMA could be a main source for people learning about microbiome analysis and I feel that here it is really lacking. Don't get me wrong I love the work you put into the book but it feels like a missed opportunity to understand and apply these concepts now. Currently I just apply the things you do in the book but I don't understand it completely and get doubts about everything I do just because every source says something different I even swapped tools plenty of times phyloseq, mia, ... just because there are so many but they all do something different without explaining why.
If you compare microbiome analysis to let's say bulk RNA-seq or scRNA-seq there atleast there are common consensus written down in nice bioconductor books teaching these principles alongside the frameworks used and its very clear but atleast you are certain in what you are doing is correct or makes some sense. I just don't get this here, sentences are to vague in terms on what to use and I also had trouble with functions being changed or removed but it's not changed in the book (transformAssay). I'm sorry but I'm really tired of reading all these sources and still not knowing what I should do and for a bioconductor book I expect in depth explanations as to why certain things are chosen over others and these basic concepts discussed in greater details. Thank you for creating this book but please continue to update it it would be very helpfull for the enire microbiome community. If you know other very good credible sources or books that explain downstream 16S amplicon analysis methods and how to perform them reliably please let me know.
Thanks! I agree entirely.
Hopefully you have noticed that the book is beta version and we are both advancing it as fast as resources allow, and welcoming pull requests from the community (which is taking place already).
The more in-depth explanations will clearly be a valuable thing to add. It just takes time to set up the technical basis for an entirely new ecosystem and we believe in the release early, release often philosophy.
This is not yet an official Bioconductor book but we are looking fwd to make it such - perhaps this spring already. At the latest then the book version and Bioconductor version will be synchonized automatically and there should be no mismatch between book and functions. Also at the moment the older functions and all examples in OMA should work, they just throw a deprecation warning. If this is not the case for some specific function we would love to know and fix asap.
The resources have increased and we are expecting a boost in development this spring.
Hi,
now it is time to decide the structure of the book as it might be harder to make bigger changes after the book is not beta version anymore. Here is my suggestion.
The goal is that the structure is intuitive and it is easy to find information. Also there should not be overlapping information. I would be happy to discuss about the stucture. "# Introduction" means that the table of contents is folded from there. By clicking it, reader can see the chapters under it (Introduction and miaverse).
'# Introduction
'# Data containers and importing
'# Data manipulations
'# Exploration
'# Diversity and dissimilarity
'# Associations
'# Networks
'# Multiomics
'# Machine learning (and statistical modeling?)
'# Advanced visualization
@antagomir
Overall, looks good to me.
We might like to consider adjustments once we see this proposal implemented but those could be expected to be small.
Exact titles we might like consider in some cases, they should be generally short and not too technical.
It is really helpful to provide background but let us try to refer to external sources for more information also.
I would not put much emphasis on phyloseq (or why not phyloseq); should be enough to mention briefly and rather focus on the advantages of SE. In general, let's write in a positive tone, like instead of why not use something we discuss why to use some other thing; or e.g. instead of "SE was not implemented for microbiome research", we "saw the great opportunity to take advantage of SE framework in microbiome research" etc.
Overall, the plan is good and I can comment more when the structure is taking shape.
Consider structure of OMA. How much general information there should be about microbiome data science (not directly related to miaverse)? Should we structure the book differently? Currently it seems that the material is just put somewhere.
It could be beneficial if we have a book that gathers all relevant information and directs the reader to correct source material.
The structure could be: