microbiome / miaViz

Microbiome Analysis Plotting and Visualization
https://microbiome.github.io/miaViz
Artistic License 2.0
10 stars 12 forks source link

plotAbundance returns NULL with order_sample_by #18

Closed antagomir closed 3 years ago

antagomir commented 3 years ago

The following returns NULL. The plotting does work without the order_sample_by argument.

library(mia)
library(miaViz)
data(GlobalPatterns)
tse <- GlobalPatterns
tse <- relAbundanceCounts(tse)
p <- plotAbundance(tse,
       abund_values="relabundance",
       rank = "Phylum",
       order_sample_by = "SampleType") 
print(p)
TuomasBorman commented 3 years ago

This does work

p$abundance + scale_y_continuous(label = scales::percent)

The function returns a list when features are added, and a list cannot be scaled --> NULL


#' @return 
#' a \code{\link[ggplot2:ggplot]{ggplot}} object or list of 
#' \code{\link[ggplot2:ggplot]{ggplot}} objects, if `features` are added to 
#' the plot. 
TuomasBorman commented 3 years ago

I checked that too quickly, because it shouldn't return a list

antagomir commented 3 years ago

The ggplot object should behave in the same, expected way for counts and relabundances.

TuomasBorman commented 3 years ago

That code that you provide doesn't actually return NULL. I returns a list of ggplot objects, that contains 1 object.

This does

library(mia)
library(miaViz)
data(GlobalPatterns)
tse <- GlobalPatterns
tse <- relAbundanceCounts(tse)
p <- plotAbundance(tse,
                   abund_values="relabundance",
                   rank = "Phylum",
                   order_sample_by = "SampleType") 

p <- p + scale_y_continuous(label = scales::percent)

print(p)

I think the bug lies on order_sample_by and features, and the function works similarly with relabundance and counts

It is documented there that features is added when order_sample_by is specified. It is also documented that when features is added, a list of ggplot objects are returned. So, the function is working as expected in that manner: a list is returned.

However, what is does not do when order_sample_by is specified is that it does not return "feature data" ggplot object with the primary ggplot; it return only one object in a list.

When features is specified without order_sample_by, it returns primary ggplotobject and feature data ggplot.

Without further analysis, it would work if

if(length(list) == 1){
return(list[1]) }

However, I think it should be checked if those functionalities are working as expected in all sense

antagomir commented 3 years ago

Yes, this is a tricky one.

It is a realistic use case that one is simply willing to use order_sample_by with a single variable, without features list and the question originated from a new user.

I tend to suggest that the default behavior would be to return just the first list element in our example case above, and only return a list if this is explicitly required.

If this turns out to be a problematic choice in practice, we could return to evaluate the pros and cons of both solutions.

Alternatively, we could provide a function argument or examples to circumvent this explicitly, and add clear examples on this use case. This would seem unnecessarily complicated for this expected simplest use case.

What would @FelixErnst say?

FelixErnst commented 3 years ago

I think the combination of both your approaches should work.

I tend to suggest that the default behavior would be to return just the first list element in our example case above, and only return a list if this is explicitly required.

if(length(list) == 1){
return(list[1]) }

The conditional clause above should be as specific as possible and the documentation needs to be clarified accordingly as @tvborm has already described.

FelixErnst commented 3 years ago

Also maybe include an example using patchwork to plot a list of ggplot objects (I think its wrap_plots).

Then we can just say rtfm the next time 😄

antagomir commented 3 years ago

I embrace these thoughts entirely.

TuomasBorman commented 3 years ago

Currently, this returns 1 plot

p <- plotAbundance(tse,
                   abund_values="relabundance")

This returns 1 plot

a <- plotAbundance(tse,
                   abund_values="relabundance",
                   order_sample_by = "shannon")

This returns 2 plots

p <- plotAbundance(tse,
                   abund_values="relabundance",
                   features = "SampleType")

This returns 3 plots

p <- plotAbundance(tse,
                   abund_values="relabundance",
                   features = "SampleType",
                   rank = "Phylum",
                   order_sample_by = "shannon") 

The main plot is always called abundance. In this 3 plot case, plots called abundance, SampleType and shannonare returned.

In the documentation it says

#' @param order_sample_by A single character value from the chosen rank of abundance
#'   data or from \code{colData} to select values to order the abundance
#'   plot by. If the value is not part of \code{features}, it will be added.
#'   (default: \code{order_sample_by = NULL})

Especially: If the value is not part of \code{features}, it will be added

However, should that be the case? Should order_sample_by be used only in ordering? Currently it returns also an extra plot in addition to ordering, but only when also features is specified.

If only abundance and features plots should be returned, it is easy to fix with plot_out[["abundance"]] and plot_out[[features]].

antagomir commented 3 years ago

I do not see a clear need and support the suggested change