Closed TuomasBorman closed 4 weeks ago
Also consider plotting more than 20 (maybe 25) taxa with discrete colors. As seen in plots above, the colors are in continuous scale which makes it hard to read. If there are 20 or less taxa, the color scale is discrete.
Also related: https://github.com/microbiome/OMA/issues/197
There are three options to display sample names without cluttering.
flipped = TRUE
which changes orientation axis and flips graph counter clockwise.theme(axis.text.x = element_text(angle = 45, hjust = 1))
to change orientation at x-axis for example(more flexibility for user to control display)Thanks theme(axis.text.x = element_text(angle = 45, hjust = 1))
seems to solve the problem of sample names.
Couple more things came to my mind while generating plots in one project
# Prepare data
library(miaViz)
data("Tengeler2020")
tse <- Tengeler2020
tse <- tse[, 1:20]
colData(tse)[["patient"]] <- rep(paste0("patient", seq_len(4)), each = ncol(tse) / 4)
colData(tse)[["sampletype"]] <- factor(rep(paste0("sampletype", seq_len(2)), ncol(tse) / 10))
tse <- tse[, 1:19]
Sometimes user wants to define the order of taxa. For instance, there might be some specific taxa that user wants to be listed first. For example, here in figure 3 they have plotted "Other" first: https://www.researchgate.net/publication/347867791_The_Urinary_Microbiome_in_Postmenopausal_Women_with_Recurrent_Urinary_Tract_Infections/figures
For instance, below Firmicutes is plotted first. I am not sure what is the best way to achieve the desired behavior. (Maybe we could check if values are factors and get the order from levels?)
asd <- c("Firmicutes" = "1_Firmicutes")
rowData(tse)[["Phylum"]][ rowData(tse)[["Phylum"]] == names(asd) ] <- asd
plotAbundance(tse, rank = "Phylum", as.relative = TRUE)
When we want to display sample type, for instance, the type is plotted as colors. However, it might be better to have it as own facet?
Below is our current solution
p <- plotAbundance(tse, rank = "Phylum", as.relative = TRUE, col.var = "sampletype", order.col.by = "sampletype")
library(patchwork)
wrap_plots(p, ncol = 1, heights = c(0.95,0.05))
Behind the link, in figure 2, you can see how the same thing is achieved with facets: https://www.researchgate.net/publication/347867791_The_Urinary_Microbiome_in_Postmenopausal_Women_with_Recurrent_Urinary_Tract_Infections/figures
Sometimes we have samples that are drawn from same patient (for instance, time is varying). Currently, we do not have method for plotting that kind of plot. The best that can be done currently is this:
tse_list <- splitOn(tse, "sampletype")
plot_list <- lapply(tse_list, function(x){
colnames(x) <- x$mappac_id
p <- plotAbundance(x, as.relative = TRUE,, rank = "Phylum", add_x_text = TRUE) +
labs(title = unique(colData(x)[["sampletype"]]))
return(p)
})
wrap_plots(plot_list, ncol = 1)
but as you can see, the samples do not match. (Maybe we could add missing samples, for instance in the figure above, to sampletype2?)
@Daenarys8 Can you check if you can find solutions for these? We can then discuss more how to implement them.
I checked some of these and it is interesting because we do have
order.col.by
which can order the taxa but with the downside of ordering the counts as well. Perhaps we could modify it a little.plotAbundance(tse, rank = "Phylum", order.col.by = "Firmicutes")
.feature_plotter
or .abund_plotter
we can achieve displaying column values with facet_wrap. On second thought, if the whole idea of .features_plotter was for column plots, we could remove it totally and modify .abund_plotter to consume col.var as condition for such plot.plotAbundance(tse, rank = "Phylum", order.col.by = "Firmicutes", col.var = "sampletype")
The above plot could be much better though.
plot_list <- lapply(tse_list, function(x){
p <- plotAbundance(x, as.relative = TRUE,, rank = "Phylum", add_x_text = TRUE, order.col.by = "Firmicutes")
return(p)
})
wrap_plots(plot_list, ncol = 1)
Looks very nice.
Perhaps 1 is enough. I still have to test it. 2. Looks good.
3.
As you can see from my plot, sample 10 is missing from the sampletype2. You are correct that it is not there at the first place (we do not have sample for "sample10" - "sampletype2"). However, because there are missing sample, the samples are misaligned in plots. The plot would be tidier, if the sampletype2 and sampletype1 would align with each other. (Would be easier to read and in practice, we would not need the sample labels anymore.)
However, I am wondering what is the best way to showcase paired samples. One option is to add "empty sample" in place of missing samples (here "sample10" - "sampletype2").
Can you check if this is already solved in some papers? We could then get the idea from them
1. That also orders the data based on certain feature. However, my collaborator wants that "unidentified" taxa is in the bottom of the graph.
We could add additional parameter to .order_abund_feature_data
(?) that controls which feature is on the bottom of the graph. It could work little bit similarly to order.col.by but without ordering the samples (Just the order of color bars).
2.
The idea of .features_plotter is to visualize colData variable. However, it can also visualize continuous variables which facets cannot. For me, facets look better for categorical variables. However, for some people the current option might look better.
That is why I think we should have option for this. Maybe, facet.cols = FALSE
that creates facets from col.var
3.
As already mentioned, we should handle missing samples if user wants to visualize paired samples. There could be paired=TRUE
option that makes sure that the order of samples stays the same in all facets (so that they are comparable).
Can you create a draft that takes into account these? Let's then discuss what is the best approach as this might be little bit complex issue and requires re-structuring the function.
1) Clarity relation with order.row.by
argument; should this be "bottom.row" or should we just provide examples how the user can provide arbitrary sorting?
2) not sure if I understood but sounds worth testing
3) good
One option could be that user can specify order with factor levels. That might be the easiest perhaps. So instead of characters, rowData variable could be a factor
The point was that sample information is now plotted as separate plot. However, these groups could be plotted also as facets. However, facets are only for categorical variables, not for numeric variables. That is why we should still keep the current functionality also.
One problem is that it makes the function more complex for user if we have many different options
order.row.by
?1.
That is not possible. User can only specify either "name" (alphabetical order, "abund" (abundance), or "revabund" (reverse abundance).
The idea is to get this kind of plot. Here "Other" group is not interesting, so it is in the bottom. I found that some papers have this kind of plot.
1.
That might be the easiest and most transparent solution. However, we should check that those elements in a vector match with features.
If user wants to agglimerate the data, it might not be clear what those names are. We could disable the vector option if user wants to agglomerate.
(The same solution could work for columns also)
Sounds good. There could be informative warning if user tries to do both.
@Daenarys8 Would you be able to create a draft for these?
I am currently working with this and hopefully get something out tomorrow
1.
When sample names are plotted, one cannot read them as they are over each other
Some other functions seem to have
angle_x_text
parameter, butplotAbundance
does not have option to rotate text.Also, we could consider if sample names could be specified from
colData(tse)
. For example, paired samples must have unique names currently, but better option would be to allow shared names so that one can easily see which samples are drawn from same patient.2.
I user wants to compare abundances between groups or if samples are paired for instance, our solution might be suboptimal.
It might be hard to read the plot when there are multiple groups (space between groups might help).
Another option would be to plot abundances as shown here in figure 1b