smorabit / hdWGCNA

High dimensional weighted gene co-expression network analysis
https://smorabit.github.io/hdWGCNA/
Other
315 stars 31 forks source link

TOM would not update #257

Closed DelongZHOU closed 2 weeks ago

DelongZHOU commented 3 weeks ago

Hello Sam,

As part of my differential connectivity experiment I'm shuffling cell condition label and remaking the TOM for the FDR. Despite I'm setting random seed for each iteration, shuffling cell condition, removing saved TOM files and clearing memory, more often than not the TOM files are not updated. Here's the part of my code that got involved:

#function to create TOM file
create_TOM_cond<-function(cond) {
    #subset the master Seurat object by condition
    seurat_obj_cond <-  subset(combined,subset=condition==cond)
    print(seurat_obj_cond) #check cell count
    seurat_obj_cond <- SetupForWGCNA(
      seurat_obj_cond,
      gene_select = "custom", # the gene selection approach to use all genes in the base tom file
      features = base_genes, # fraction of cells that a gene needs to be expressed in order to be included
      wgcna_name = cond # the name of the hdWGCNA experiment
    )

    # construct metacells  in each group
    seurat_obj_cond <- MetacellsByGroups(
      seurat_obj = seurat_obj_cond,
      group.by = c("subtype", "condition"), # specify the columns in seurat_obj@meta.data to group by
      reduction = 'pca', # select the dimensionality reduction to perform KNN on
      k = 25, # nearest-neighbors parameter
      max_shared = 10, # maximum number of shared cells between two metacells
      ident.group = 'subtype' # set the Idents of the metacell seurat object
    )
    # normalize metacell expression matrix:
    seurat_obj_cond <- NormalizeMetacells(seurat_obj_cond)
    # Set up the expression matrix
    seurat_obj_cond <- SetDatExpr(
      seurat_obj_cond,
      group_name = celltype, # the name of the group of interest in the group.by column
      group.by='subtype', # the metadata column containing the cell type info. This same column should have also been used in MetacellsByGroups
      assay = 'RNA', # using RNA assay
      slot = 'data' # using normalized data
    )
    # Test different soft powers:
    #disable the multithread for this step.
    #Occasionally multithread will cause the script to stuck in the block for unknown reasons.

    seurat_obj_cond <- TestSoftPowers(
      seurat_obj_cond,
      networkType = 'signed' # you can also use "unsigned" or "signed hybrid"
    )

    # construct co-expression network:
    seurat_obj_cond <- ConstructNetwork(
      seurat_obj_cond, 
      setDatExpr=FALSE,
      tom_name = paste0(celltype,'.',cond), # name of the topoligical overlap matrix written to disk
      overwrite_tom = TRUE
    )
    TOM <- GetTOM(seurat_obj_cond)
    return (TOM)
}
#save condition column to true.condition
combined[['true.condition']]<-combined[['condition']]
#extract condition labels
cdt=combined$condition
#remove name to break link with cell tag
names(cdt)<-NULL

#function to shuffle cell condition
produce_shuffle_cell_scores<-function(i) {
    #remove previous TOM files
    unlink("TOM", recursive = TRUE)
    set.seed(i)
    shuffle_cdt<-sample(cdt) #shuffle condition label
    print(shuffle_cdt[1:10]) #make sure condition label are shuffled
    combined[['condition']]<-shuffle_cdt #replace condition label by shuffled label
    #double check shuffled condition distribution
    print(table(combined[[c('condition','true.condition')]])) #each seed produce different count table
    TOMs_shuffle<-map(conditions,create_TOM_cond)
    for (TOM_shuffle in TOMs_shuffle) { print(TOM_shuffle[1:5,1:5]) } #often this produces the same TOM matrix so same the module scores later
    names(TOMs_shuffle)<-conditions
    module_scores<-map(TOMs_shuffle, function(TOM_shuffle) (map_vec(modules,function(module) extract_module_score(module,TOM_shuffle))))
    #reformat module_scores
    names(module_scores)<-conditions
    module_scores<-as.data.frame(module_scores)
    module_scores[['module']]<-rownames(module_scores)
    module_scores[['loop_index']]<-i
    #clear memory
    rm(TOMs_shuffle)
    rm(TOM_shuffle)
    return (module_scores)
}
#check the module score
produce_shuffle_cell_scores(1)
produce_shuffle_cell_scores(2)

Sometimes both call of shuffled cells return the same as unshuffled table, sometimes the seed 1 is successful but seed 2 returns the same as seed 1 despite the table of condition/true condition indicates the cell distribution has changed.

I know that Seurat's subset function reset the seed, but I set the seed and immediately shuffle labels before calling the subset and confirm the shuffling by the condition/true condition table.

It's also inconsistant when the TOM gets updated. Sometimes I manage to get it update manually running the create_TOM_cond function (other times that fails too, I don't understand why), but not in the map function; sometimes the map works but won't work inside the score function, Some times the score function works when ran interactively, but not as part of Rscript <>.r

I'm totally at loss what might be the cause. Would you have an idea? Thanks!

Best, Delong

smorabit commented 2 weeks ago

Hi,

Just wondering, is this issue related to #182 ? Are you able to provide a simple example where. you can reproduce this behavior?

DelongZHOU commented 2 weeks ago

Hi Sam, I'm not sure whether it's related to #182 . For what it's worth, deleting the Tom file using unlink("TOM", recursive = TRUE) doesn't fix the problem.

I'll check with my PI about sharing the data. Do you have a email that I can send to?

Thanks! Delong

DelongZHOU commented 2 weeks ago

I played with my codes some more and realized that in my shuffle cell function only the "local" copy of the seurat object is changed while the create Tom function is using the "global" copy. I changed a few lines and now the correct object is used and producing different scores for different seed. Thank you!