jmw86069 / multienrichjam

Analysis and Visualization of Multiple Gene Set Enrichments
24 stars 4 forks source link

multiEnrichMap undefined columns selected #6

Open tlaus opened 2 years ago

tlaus commented 2 years ago

Hello,

I ran into an issue with multiEnrichMap, where it would sometimes fail with "undefined columns selected" error. In the browser mode I found out that the function is attempting to remove duplicates during name vector creation, but subsetting with a character vector containing NAs. After some digging around I found out that is because of line 603 in jamenrich-base.r the require_non_na is set to FALSE:

directionColname <- find_colname(directionColname,
     iDF1,
     require_non_na=FALSE);

Is this by design? When I changed the variable to TRUE it fixed my problem. Are there any potential problems I might run into by changing this behavior?

jmw86069 commented 2 years ago

I just saw this issue and apologize for the delay! I’m really not getting notifications effectively somehow.

In principle this function looks for colnames matching a text pattern, and the flag you referenced influences it to choose columns that do not have NA values. I agree I’m not sure why the default would allow NA values except that I may not have had data to reveal the problem before now. In short, your workaround seems reasonable.

The direction colname is a newer feature, intended to allow optional z-score values to be used to indicate directionality. If present there may be numeric and NA values mixed in the column. However for data that does not have a z-score column, the intended outcome is for it not to find a matching colname.

If convenient can you post the first few lines of enrichment data you’re using from one of the enrichments? It might serve as a useful test case.

I will review the code and make an update to address the issue in future.

And thank you very much for reporting the issue!

SimoniMD commented 2 years ago

Hi! so how exactly was this problem fixed? I'm getting the same error and don't understand how to change it in browser mode. Please help!

jmw86069 commented 2 years ago

Hi! I was wondering if the original issue was resolved also.

My hunch is that the error occurs with data that has a “zscore” column, but where the z-scores themselves are all NA values. I tried to test but might not have found the right conditions.

If the error is with zscore columns, a potential workaround is to set directionColname to something that will not match your colnames, for example in multiEnrichJam():

directionColname=“blahblahblah”

Then the workflow should ignore z-scores altogether. If it doesn’t fix the issue, the error is with another column type. Let me know if this helps.

SimoniMD commented 2 years ago

I was able to move onto the next step, but hitting another error:

mem_canonical <- multiEnrichMap(er_canonical,
+                                 enrichBaseline=1,
+                                 cutoffRowMinP=0.05,
+                                 colorV=c("purple", "orange"),
+                                 topEnrichN=20, 
+                                 directionColname = "blah")
Warning messages:
1: package ‘igraph’ was built under R version 4.1.2 
2: package ‘IRanges’ was built under R version 4.1.1 
3: package ‘BiocGenerics’ was built under R version 4.1.1 
4: package ‘S4Vectors’ was built under R version 4.1.3 
5: package ‘arules’ was built under R version 4.1.2 
6: In all(topEnrichN) : coercing argument of type 'double' to logical
7: In asMethod(object) : removing duplicated items in transactions
8: In asMethod(object) : removing duplicated items in transactions
9: package ‘enrichplot’ was built under R version 4.1.2 
mem_canonical_plots <- mem_plot_folio(mem_canonical,
+                                       pathway_column_split=4,
+                                       node_factor=5,
+                                       use_shadowText=TRUE,
+                                       label_factor=1.5,
+                                       main="Canonical Pathways");
##  (19:53:33) 31Mar2022:  mem_plot_folio(): Enrichment P-value heatmap 
##  (19:53:33) 31Mar2022:  mem_plot_folio(): plot_num 1: Enrichment P-value Heatmap 
##  (19:53:36) 31Mar2022:  mem_plot_folio(): Gene-pathway heatmap 
Error in validObject(.Object) : 
  invalid class “itemMatrix” object: item labels not unique
In addition: Warning messages:
1: In grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y,  :
  conversion failure on 'Role of JAK1 and JAK3 in γc Cytokine Signaling' in 'mbcsToSbcs': dot substituted for <ce>
2: In grid.Call(C_textBounds, as.graphicsAnnot(x$label), x$x, x$y,  :
  conversion failure on 'Role of JAK1 and JAK3 in γc Cytokine Signaling' in 'mbcsToSbcs': dot substituted for <b3>

Any idea? Have to share my text files if it helps.

jmw86069 commented 2 years ago

That’s vaguely familiar, as if maybe the enrichment is including the same gene multiple times for the same pathway? Or the same pathway name might be in use multiple times, like from a few pathway databases.

If you don’t mind sharing the input text files I will take a look to see the issue. Hopefully for a fix or workaround.

Thanks for your patience!

SimoniMD commented 2 years ago

Do you have an email? they are data i'd like to keep off the internet for now.

interestingly, i can run that next command if i only put two of my lists, and not all 9 that i'm trying to do

jmw86069 commented 2 years ago

The same account name jmw86069 at Gmail will work.

It sounds like one or a few are the culprits.

jmw86069 commented 2 years ago

Hey thanks so much for sending the test data! It was a much simpler issue haha, ah well. Good catch, I need to fix it in future.

The argument colorV is supposed to assign one color for each enrichment provided to multiEnrichMap(), however there were only two colors, and nine enrichments, causing it to fail downstream. Apparently some downstream steps assume each color is unique - I'll put in a fix for that to make sure each color in colorV is unique.

The simple workaround is to remove colorV from the multiEnrichMap() call, then it will generate its own colors using colorjam::rainbowJam(9). You can also supply your own colors, just make sure there are 9 unique colors (haha), for example if these are from scRNAseq tSNE/UMAP clusters, you can match the categorical colors used in those figures.

One final thing: I suggest adding an argument to mem_plot_folio() which will be passed to ComplexHeatmap: use_raster=FALSE It prevents artifacts during rasterization, which doesn't work well with categorical colors. I'm going to make this argument mandatory, but probably not until tomorrow.

For your data, I was playing around with enrich_im_weight and gene_im_weight like this:

mem_plot_folio(mem_canonical,
   do_which=2,
   use_raster=FALSE,
   enrich_im_weight=0.1,
   gene_im_weight=0.1,
   ...)

It adjusts the clustering of central gene/pathway incidence, versus outer enrichment incidence matrix (top) or gene incidence matrix (left). Lower values use the central data more.

Let me know if you still have issues, I'm happy to help! Ultimately the figures have been useful for our work, sometimes takes some adjustment of labels, pathway names, etc. Hopefully there's something useful for your work too.

SimoniMD commented 2 years ago

Just want to say thanks to much for the help. It worked out. Was able to use the same palette I used for the UMAPs. We’re ttrying to figure out how these clusters differ in their response in the stimulated condition, and your little edits help bring out the differences too.

One small question, is there any way to output the entire mem_plot_folio automatically into a bunch of pngs? I have many clusters, so it’s just too hard see anything. I’ve tried the png() thing before the command, but it always only outputs 1 file and not all of them.

-Simoni

From: James Ward @.> Reply-To: jmw86069/multienrichjam @.> Date: Thursday, March 31, 2022 at 8:53 PM To: jmw86069/multienrichjam @.> Cc: SimoniMD @.>, Comment @.***> Subject: Re: [jmw86069/multienrichjam] multiEnrichMap undefined columns selected (Issue #6)

Hey thanks so much for sending the test data! It was a much simpler issue haha, ah well. Good catch, I need to fix it in future.

The argument colorV is supposed to assign one color for each enrichment provided to multiEnrichMap(), however there were only two colors, and nine enrichments, causing it to fail downstream. Apparently some downstream steps assume each color is unique - I'll put in a fix for that to make sure each color in colorV is unique.

The simple workaround is to remove colorV from the multiEnrichMap() call, then it will generate its own colors using colorjam::rainbowJam(9). You can also supply your own colors, just make sure there are 9 unique colors (haha), for example if these are from scRNAseq tSNE/UMAP clusters, you can match the categorical colors used in those figures.

One final thing: I suggest adding an argument to mem_plot_folio() which will be passed to ComplexHeatmap: use_raster=FALSE It prevents artifacts during rasterization, which doesn't work well with categorical colors. I'm going to make this argument mandatory, but probably not until tomorrow.

For your data, I was playing around with enrich_im_weight and gene_im_weight like this:

mem_plot_folio(mem_canonical,    do_which=2,    use_raster=FALSE,    enrich_im_weight=0.1,    gene_im_weight=0.1,    ...) It adjusts the clustering of central gene/pathway incidence, versus outer enrichment incidence matrix (top) or gene incidence matrix (left). Lower values use the central data more.

Let me know if you still have issues, I'm happy to help! Ultimately the figures have been useful for our work, sometimes takes some adjustment of labels, pathway names, etc. Hopefully there's something useful for your work too.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

jmw86069 commented 2 years ago

Ah that's a good question. I'll show you how I do it, then offer a workaround.

cairo_pdf("outfile.png",
   onefile=TRUE,
   height=12,
   width=12,
   pointsize=10)
mem_plot_folio(...)
dev.off()

That output will create one multi-page PDF file, where plots and text are drawn in vector format and can be zoomed without loss of detail. Helpful for heatmaps with a zillion rows - adjust font size super small and you can still zoom in (eventually) and read the text. Not ideal for all cases, but sometimes for discovery it's useful. (One day I'll figure out InteractiveComplexHeatmap!)

Last point on PDFs, you probably know: My colleagues like to edit the PDFs, move the text labels around, adjust font sizes, etc. move color keys, etc. Can be convenient if you're into that. Sometimes helpful to set use_shadowText=FALSE if you're moving labels later.

Two workarounds to produce multiple png files. One is a simple, loop through each page with argument do_which, then write each to a separate png. Something like this:

for (i in 1:9) {
   pngfile <- paste0("outfile_", i, ".png");
   png(pngfile,
      width=1200,
      height=1200,
      pointsize=12);
   mem_plot_folio(...,
      do_which=i);
   dev.off();
}

The second I never used, but seems like a more correct R usage. Haha. Oh well. See ? png in the examples for jpeg().

png(file="myplot_%d.png")
mem_plot_folio(...);
dev.off();

It will create myplot_1.png, then myplot_2.png, etc. Probably easiest path for you.

For final figure work, I usually capture output to an object, manipulate the details, then plot only that figure. For example we tend to use Cnet plots a lot, so I capture mpf <- mem_plot_folio() to an object, grab the cnet igraph object, make adjustments as needed, then plot the igraph with something like jam_igraph(). There are a bunch of helper functions like adjust_cnet_nodeset() intended to move nodes around. It's a whole thing. lol

SimoniMD commented 2 years ago

Back to the directionColname question. My data does have NAs in it. I was getting the error as you saw above. The workaround worked, but if I did want to add directionality, what would that look like?

mem_canonical <- multiEnrichMap(er_canonical, enrichBaseline=1, cutoffRowMinP=0.05, colorV=paletteer_d("tidyquant::tq_light"), topEnrichN=20, directionColname = "zScore")

brings up the error described above, and i'm not sure how to edit that file described above (doesn't come up when i search for it)

jmw86069 commented 2 years ago

Ah, the issue is that only some of the er_canonical results have the zScore column, the others do not. I will make the process fill missing columns with NA in an update soon.

You can check colnames with:

lapply(enrichList_canonical, colnames)
# or
lapply(er_canonical, function(i){colnames(i@result)})

The easiest workaround is to add this column to any enrichment data that does not already contain it.

If you're following the vignette, I would probably do that with enrichList_canonical because it is a list of data.frame objects, before being converted into enrichResult as er_canonical.

enrichList_canonical2 <- lapply(enrichList_canonical, function(idf){
   if (!"zScore" %in% colnames(idf)) {
      idf$zScore <- NA;
   }
   idf;
})

Hopefully this little R snippet will let you continue by using the IPA z-score values that do exist. I have not done extensive testing when zScores have a large number of NA values - these should be indicated similar to having zScore=0, which more or less means no directionality was reported.

The z-score itself is limited in that it assumes a gene set or pathway was defined with genes only in one direction... i.e. only the genes that promote/induce some activity or state - no genes that also inhibit/repress the same state. From my experience in IPA, most pathways are encoded with this philosophy - but I can't guarantee it, nor that their encoding is correct in all cases.

SimoniMD commented 2 years ago

I put this code in, but no dice. the workflow works as least, but there is no change with the output of mem_plot_folio(). i cant tell what changed with the directionality. What am i looking for exactly?

jmw86069 commented 2 years ago

I went back to your test data, found two issues that I corrected in multienrichjam version 0.0.57.900. Update with:

remotes::install_github("jmw86069/multienrichjam", dependencies=FALSE)

TL;DR - Update the multienrichjam package and try again.

First issue: Some of your enrichment data has no z-score, and some of the enrichment data that has a z-score also has all NA values in that column, it was sort of breaking my mental assumptions. :) Now the function only ignores directionColname if all enrichments have all NA values - in which case there is no information to be used anyway. Otherwise it handles it (from my testing at least.)

Second issue: The function mem_plot_folio() was not applying apply_direction=TRUE to the function mem_enrichment_heatmap() so directionality was not being displayed on the heatmap. The workaround was to add apply_direction=TRUE to mem_plot_folio(). The new version will auto-detect whether there is directional data (any non-NA, non-zero directional value), but it can be overridden with apply_direction=FALSE.

You can optionally apply a direction_cutoff which is sets a minimum absolute z-score in order for direction to be indicated. This threshold will set any z-score below the threshold to be shown as z-score=0. The color gradient is pretty effective though.

About your data, I see some directionality, but not for many pathways -- this is typical for me as well. IPA does not encode direction into every pathway, nor is direction relevant to some pathways, as you probably already know from experience. Where there is directionality, there is usually solid supporting data; absence of directionality is non-conclusive, it still can be statistically enriched.

Note: I have not yet propagated directionality into all downstream plots, though it isn't always necessary. Color can be used to describe: (a) enrichment result, or (b) directionality. I usually color by enrichment -- but there are other approaches to include multiple factors into visualizations. for example using shape for direction? Maybe triangle-up, triangle-down, oval for no-directionality.

I also reported an issue with Dr. Gu in ComplexHeatmap for divergent color scales - which should address the color legend, specifically adding x-axis labels to the legend. Currently z-score range is -2 to +2.

I appreciate you reporting the issues! These are edge cases I had not seen.