willwerscheid / flashier

A faster and angrier package for EBMF.
https://willwerscheid.github.io/flashier/
Other
10 stars 12 forks source link

Feedback on new plot functions #122

Open pcarbo opened 6 months ago

pcarbo commented 6 months ago

@willwerscheid I'm going to start a thread here on the plot functions.

  1. For the documentation to be consistent across plot functions, I think the first argument to all functions needs to be "x"; currently you use "x" and "fl".

  2. I get these warnings:

flash_plot_scatter: no visible binding for global variable ‘name’
flash_plot_scatter: no visible binding for global variable ‘label’

which I think are caused by this line in flash_plot_heatmap:

mutate(label = ifelse(rank(-abs(val)) > n_labels, "", name), color = ifelse(label != "", "dodgerblue", "gray80"))

I think rlang::.data can be used to fix the warning.

pcarbo commented 6 months ago
kset: A vector of integers specifying the factor/loadings pairs to
      be plotted. If ‘kset = NULL’, then all will be plotted.

Does kset also specify the order in which the k's are plotted?

pcarbo commented 6 months ago

@willwerscheid Regarding this (and other similar parts of your code):

# Bind variables to get rid of annoying R CMD check note:
topic <- prop <- NULL

See this vignette for how to deal with this issue properly for ggplot2, mutate, and other tidyverse functions where this issue arises. Or see help(aes_string).

pcarbo commented 6 months ago

A general observation about the plots: I think the X and/or Y labels sometimes need to be adapted depending on whether pm_which = "factors" or pm_which = "loadings".

pcarbo commented 6 months ago

Wherever geom_text_repel is used, I suggest adding an argument max.overlaps to your plotting function, and setting the default to be max.overlaps = Inf. Sometimes you end up with no labels shown and it can be frustrating to figure out that you need to adjust "max.overlaps".

pcarbo commented 6 months ago

I find it a bit weird that you include this functionality in the "plot" function:

x <- plot(fit,plot_type = "data.frame")

Maybe it this would be more logical with the "as.data.frame" S3 method?

x <- as.data.frame(fit)
pcarbo commented 6 months ago

There is no such thing as a "ggplot2 object"; the outputs are "ggplot" objects:

class(p)
# [1] "gg"     "ggplot"
pcarbo commented 6 months ago

@willwerscheid I've tested these functions:

flash_plot_heatmap
flash_plot_scatter
flash_plot_scree
flash_plot_structure

My comments above come from my testing of these functions.

I have not tested these functions:

flash_plot_bar
flash_plot_histogram

If you can illustrate the use of these last two functions in the analysis of the pbmc_facs data, I'm happy to test out these as well.

pcarbo commented 6 months ago

Also, maybe remove the title from the scree plot since none of the other plots have titles?

pcarbo commented 6 months ago

I just tested out flash_plot_bar. Looks good.

willwerscheid commented 6 months ago

First batch of changes:

  1. Removed title from scree plot
  2. Changed "ggplot2 object" to "ggplot object"
  3. Removed plot_type = "data.frame" option (if a user really wants this they can extract it from the returned ggplot object).
  4. Added "max_overlaps" argument to plots using geom_text_repel. Removed ... since users can set labels = FALSE and add geom_text_repel() themselves if they want the full range of options (... is passed to facet_wrap() in other plot functions when faceting is used and I wanted to keep this consistent). Added examples showing how to add geom_text_repel() manually.
  5. Added fl to the documentation. Yes, this is inconsistent with plot.flash(); it needs to be x to work as an S3 method but I prefer fl for the flash_plot_xxx functions since fl is the name of the first argument in every other flash_xxx function.
pcarbo commented 6 months ago

@willwerscheid Thank you. Let me know when the plotting functions are ready for me to review again.

willwerscheid commented 6 months ago

Second batch:

  1. Fixed up terminology (replaced "loading" with "posterior mean" and "factor" with "component" where more appropriate)
  2. Handled "no visible binding for global variable" correctly (using rlang::.data).
  3. Updated documentation for kset and order_by_pve parameters to make it clear that kset specifies the order in which components are to be plotted (unless order_by_pve = TRUE).
willwerscheid commented 6 months ago

Finally, added example of overlapping histograms plot to the single cell vignette. @pcarbo the changes are ready for your review. Thanks again!

pcarbo commented 6 months ago

@willwerscheid First batch of changes look good.

pcarbo commented 6 months ago

I'm proposing a few adjustments to flash_plot_histogram() in commit aad0cbd which will create a slightly cleaner histogram, e.g.,

plot(fit,
     plot_type = "histogram",
     pm_which = "loadings",
     pm_groups = pbmc_facs$samples$subpop,
     bins = 20)
Screenshot 2024-05-06 at 4 06 13 PM
pcarbo commented 6 months ago

The second batch of updates also sound good to me.

pcarbo commented 6 months ago

I've updated the pkgdown page to reflect these changes.

pcarbo commented 6 months ago

No further comments!