smorabit / hdWGCNA

High dimensional weighted gene co-expression network analysis
https://smorabit.github.io/hdWGCNA/
Other
316 stars 31 forks source link

Error with ModuleUMAPPlot function in hdWGCNA #186

Closed venkan closed 2 months ago

venkan commented 5 months ago

I'm using the seurat_obj

An object of class Seurat 
87166 features across 109803 samples within 2 assays 
Active assay: MAGIC_RNA (43583 features, 0 variable features)
 1 other assay present: RNA
 3 dimensional reductions calculated: pca, umap, harmony

There are a totally of 41 modules, identified with this. I ran RunModuleUMAP function and then used ModuleUMAPPlot function to show 2 hub gene names from each module.

ModuleUMAPPlot(seurat_obj,
  edge.alpha=0.25,
  sample_edges=TRUE,
  edge_prop=0.1, # proportion of edges to sample (20% here)
  label_hubs=2 ,# how many hub genes to plot per module?
  keep_grey_edges=FALSE)

It shows the following message:

image

And when I checked:

# get the hub gene UMAP table from the seurat object
umap_df <- GetModuleUMAP(seurat_obj)
head(umap_df)

This showed 4450 genes containing missing values (NA) in the kME column of umap_df, but still they are assigned to some Modules.

smorabit commented 5 months ago

Hi, I have never come across this error before. Are you able to reproduce it on the tutorial dataset? I am also wondering if you have any NA values in your modules table:

any(is.na(GetModules(seurat_obj)))
venkan commented 5 months ago

Hi, thank you for your reply. Yes, there are 4450 genes containing missing values (NA) in the kME column of umap_df, but still they are assigned to some Modules. Do you think I should remove those genes with NA in the kME column?

smorabit commented 5 months ago

I understand that you have NA values in umap_df, but above I was asking about a different table that you can access using GetModules.

venkan commented 5 months ago

okay. yes, it shows TRUE.

image

I checked the output of GetModules(seurat_obj), and there are 157 genes showing NA in all columns of kME for each module. Showing below a small piece of that.

image
  1. Why is there NA in the kME of the modules, does it mean they don't show any module membership?
  2. Do I need to remove those genes from the seurat_obj to run ModuleUMAPPlot? If so, how to do that?
smorabit commented 5 months ago

I am not sure why there would be any NA values in these tables, I have never come across this before. Could you show your code for the previous analysis steps? A temporary fix would be to remove the genes with NA values but it would be nice to know where these originate from.

To remove the NA genes:

# get the modules dataframe
modules <- GetModules(seurat_obj)

# remove NAs
modules <- na.omit(modules)

# update the modules dataframe
seurat_obj <- SetModules(seurat_obj, modules)

You should be able to re-run RunModuleUMAP after this.

venkan commented 5 months ago

I ran RunModuleUMAP already, and was able to create a network with ggplot, even though there are NA

You can find the attachment here, I mentioned all the steps in it. I used custom genes as input (19996 genes).

hdWGCNA_steps.txt

smorabit commented 5 months ago

Can you just paste the code directly into this conversation so it is easy to refer back to? Thanks.

venkan commented 5 months ago

Okay here it is:

seurat_obj <- SetupForWGCNA(scRNA,
  gene_select = "custom", # the gene selection approach
  gene_list = features,  ## custom genes were used total 19996 genes used        
  wgcna_name = "hdWGCNA") # the name of the hdWGCNA experiment

seurat_obj <- MetacellsByGroups(seurat_obj = seurat_obj,
  group.by = c("seurat_clusters"), # specify the columns in seurat_obj@meta.data to group by
  assay = 'MAGIC_RNA',
  slot = 'data',
  reduction = 'pca', # select the dimensionality reduction to perform KNN on
  k = 25, # nearest-neighbors parameter
  max_shared = 10, # maximum number of shared cells between two metacells
  ident.group = 'seurat_clusters') # set the Idents of the metacell seurat object

# normalize metacell expression matrix:
seurat_obj <- NormalizeMetacells(seurat_obj, verbose=FALSE)

seurat_obj <- SetDatExpr(seurat_obj,
  group_name = c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16"), # the name of the group of interest in the group.by column
  group.by='seurat_clusters', # the metadata column containing the cell type info. This same column should have also been used in MetacellsByGroups
  assay = 'MAGIC_RNA', # using RNA assay
  slot = 'data') # using normalized data

# Test different soft powers:
seurat_obj <- TestSoftPowers(seurat_obj,
  networkType = 'unsigned') # you can also use "unsigned" or "signed hybrid"
# plot the results:
plot_list <- PlotSoftPowers(seurat_obj)
# assemble with patchwork
wrap_plots(plot_list, ncol=2)

## Construct co-expression network
seurat_obj <- ConstructNetwork(seurat_obj, soft_power=4,
  setDatExpr=FALSE,
  networkType = 'unsigned',
  TOMType = "unsigned",
  tom_name = 'TOM') # name of the topoligical overlap matrix written to disk

# need to run ScaleData first or else harmony throws an error:
seurat_obj <- ScaleData(seurat_obj, features=VariableFeatures(seurat_obj))

# compute all MEs in the full single-cell dataset
seurat_obj <- ModuleEigengenes(seurat_obj,
 group.by.vars="seurat_clusters")

# compute eigengene-based connectivity (kME):
seurat_obj <- ModuleConnectivity(seurat_obj,
  group.by = 'seurat_clusters', group_name = '2',sparse=FALSE)

# get the module assignment table:
modules <- GetModules(seurat_obj)

# show the first 6 columns:
head(modules[,1:6])
dim(modules)
#write.csv(modules, "Modules.csv", row.names=FALSE)

seurat_obj <- RunModuleUMAP(
  seurat_obj,
  n_hubs = 10, # number of hub genes to include for the UMAP embedding
  n_neighbors=15, # neighbors parameter for UMAP
  min_dist=0.1) # min distance between points in UMAP space
smorabit commented 5 months ago

I am not really sure what could be causing these NA values to show up just by looking at your code. I would be able to help better if I could reproduce this behavior on my own machine but I am not able to do so.

However, I can see a couple of other potential issues with your analysis based on this code.

First, I can see that in MetacellsByGroups you are using an assay called "MAGIC_RNA". I am just wondering if you ran MAGIC to impute gene expression values before hdWGCNA? I personally haven't tried this and I do not think that it is necessary before running hdWGCNA so you should consider switching to the "RNA" assay and the "counts" slot.

Second and more importantly, based on how you are running ModuleEigengenes, you are telling it to remove "batch effects" on the basis of the cell cluster. You should set this to a more appropriate variable (sequencing batch, replicate, dataset of origin, etc) or you can just remove this argument entirely.

venkan commented 5 months ago

Yes, I used MAGIC before hdWGCNA. There are a few genes I'm interested in which have low expression. All my other analysis was done with MAGIC-imputed expression values. So, I worked with the same for hdWGCNA.

For ModuleEigengenes, okay I understand what you mean. I removed the argument now.

Also for ModuleUMAPPlot, as you said I removedNA genes and re-run the ModuleUMAPPlot, but still have an error:

image

Here is the ERROR:

Error in if (col1 == col2) {: the condition has length > 1
Traceback:

1. ModuleUMAPPlot(seurat_obj, edge.alpha = 0.25, sample_edges = TRUE, 
 .     edge_prop = 0.1, label_hubs = 2, keep_grey_edges = FALSE)
2. future.apply::future_sapply(1:nrow(edge_df), function(i) {
 .     gene1 = as.character(edge_df[i, "Var1"])
 .     gene2 = as.character(edge_df[i, "Var2"])
 .     col1 <- selected_modules[selected_modules$gene_name == gene1, 
 .         "color"]
 .     col2 <- selected_modules[selected_modules$gene_name == gene2, 
 .         "color"]
 .     if (col1 == col2) {
 .         col = col1
 .     }
 .     else {
 .         col = "grey90"
 .     }
 .     col
 . })
3. future_lapply(X = X, FUN = FUN, ..., future.envir = future.envir, 
 .     future.label = future.label)
4. future_xapply(FUN = FUN, nX = nX, chunk_args = X, args = list(...), 
 .     get_chunk = `chunkWith[[`, expr = expr, envir = envir, future.envir = future.envir, 
 .     future.globals = future.globals, future.packages = future.packages, 
 .     future.scheduling = future.scheduling, future.chunk.size = future.chunk.size, 
 .     future.stdout = future.stdout, future.conditions = future.conditions, 
 .     future.seed = future.seed, future.label = future.label, fcn_name = fcn_name, 
 .     args_name = args_name, debug = debug)
5. withCallingHandlers({
 .     values <- local({
 .         oopts <- options(future.rng.onMisuse.keepFuture = FALSE)
 .         on.exit(options(oopts))
 .         value(fs)
 .     })
 . }, RngFutureCondition = function(cond) {
 .     idx <- NULL
 .     uuid <- attr(cond, "uuid")
 .     if (!is.null(uuid)) {
 .         for (kk in seq_along(fs)) {
 .             if (identical(fs[[kk]]$uuid, uuid)) 
 .                 idx <- kk
 .         }
 .     }
 .     else {
 .         f <- attr(cond, "future")
 .         if (is.null(f)) 
 .             return()
 .         if (!isFALSE(f$seed)) 
 .             return()
 .         for (kk in seq_along(fs)) {
 .             if (identical(fs[[kk]], f)) 
 .                 idx <- kk
 .         }
 .     }
 .     if (is.null(idx)) 
 .         return()
 .     f <- fs[[idx]]
 .     label <- f$label
 .     if (is.null(label)) 
 .         label <- "<none>"
 .     message <- sprintf("UNRELIABLE VALUE: One of the %s iterations (%s) unexpectedly generated random numbers without declaring so. There is a risk that those random numbers are not statistically sound and the overall results might be invalid. To fix this, specify 'future.seed=TRUE'. This ensures that proper, parallel-safe random numbers are produced via the L'Ecuyer-CMRG method. To disable this check, use 'future.seed = NULL', or set option 'future.rng.onMisuse' to \"ignore\".", 
 .         sQuote(.packageName), sQuote(label))
 .     cond$message <- message
 .     if (inherits(cond, "warning")) {
 .         warning(cond)
 .         invokeRestart("muffleWarning")
 .     }
 .     else if (inherits(cond, "error")) {
 .         stop(cond)
 .     }
 . })
6. (function() {
 .     oopts <- options(future.rng.onMisuse.keepFuture = FALSE)
 .     on.exit(options(oopts))
 .     value(fs)
 . })()
7. value(fs)
8. value.list(fs)
9. resolve(y, result = TRUE, stdout = stdout, signal = signal, force = TRUE)
10. resolve.list(y, result = TRUE, stdout = stdout, signal = signal, 
  .     force = TRUE)
11. signalConditionsASAP(obj, resignal = FALSE, pos = ii)
12. signalConditions(obj, exclude = getOption("future.relay.immediate", 
  .     "immediateCondition"), resignal = resignal, ...)
smorabit commented 5 months ago

Are you able to reproduce either of these errors with the tutorial dataset? It is hard for me to resolve the problem if I cannot reproduce it.

smorabit commented 2 months ago

Closing this issue due to inactivity.