mortazavilab / PyWGCNA

PyWGCNA is a Python package designed to do Weighted Gene Correlation Network analysis (WGCNA)
https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btad415/7218311
MIT License
214 stars 52 forks source link

Error during "plotting module heatmap eigengene..." step of WGNCA.analyseWGCNA() #109

Closed bwmr closed 4 months ago

bwmr commented 4 months ago

Hi,

Thanks a lot for the nice-to-use package!

I am trying it out on some proteomics data currently, but I consistently encounter an issue during the analysis step. WGNCA.find_modules() identifies three modules. WGNCA.analyse() correctly outputs the module-trait relationship plot with all three, as well as the module eigengene plots for the first two (in this case, dimgrey and black), but fails before plotting the third with the below message.

Best, Benedikt

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[33], line 1
----> 1 network_rb.analyseWGCNA(show=False)

File ~/micromamba/envs/data-analysis/lib/python3.10/site-packages/PyWGCNA/wgcna.py:452, in WGCNA.analyseWGCNA(self, order, geneList, show, alternative)
    450     print(f"{OKCYAN}plotting module heatmap eigengene...{ENDC}")
    451     for module in modules:
--> 452         self.plotModuleEigenGene(module, metadata, show=show)
    453     print("\tDone..\n")
    455 if self.save:

File ~/micromamba/envs/data-analysis/lib/python3.10/site-packages/PyWGCNA/wgcna.py:2958, in WGCNA.plotModuleEigenGene(self, moduleName, metadata, show)
   2956 a = pdist(heatmap)
   2957 np.nan_to_num(a, copy=False)
-> 2958 Z = WGCNA.hclust(a, method="average")
   2959 # Clusterize the data
   2960 labels = fcluster(Z, t=0.8, criterion='distance')

File ~/micromamba/envs/data-analysis/lib/python3.10/site-packages/PyWGCNA/wgcna.py:756, in WGCNA.hclust(d, method)
    753 if method == -1:
    754     sys.exit("Ambiguous clustering method.")
--> 756 dendrogram = linkage(d, method=method)
    758 return dendrogram

File ~/micromamba/envs/data-analysis/lib/python3.10/site-packages/scipy/cluster/hierarchy.py:1033, in linkage(y, method, metric, optimal_ordering)
   1029 if not xp.all(xp.isfinite(y)):
   1030     raise ValueError("The condensed distance matrix must contain only "
   1031                      "finite values.")
-> 1033 n = int(distance.num_obs_y(y))
   1034 method_code = _LINKAGE_METHODS[method]
   1036 y = np.asarray(y)

File ~/micromamba/envs/data-analysis/lib/python3.10/site-packages/scipy/spatial/distance.py:2605, in num_obs_y(Y)
   2603 k = Y.shape[0]
   2604 if k == 0:
-> 2605     raise ValueError("The number of observations cannot be determined on "
   2606                      "an empty distance matrix.")
   2607 d = int(np.ceil(np.sqrt(k * 2)))
   2608 if (d * (d - 1) / 2) != k:

ValueError: The number of observations cannot be determined on an empty distance matrix.
bwmr commented 4 months ago

Nevermind, I figured out that this issue was caused by one module containing only 1 protein(?) - reducing the minModuleSize during setup lead to a more even distribution. Now, the process fails at plotting module barplot eigengene, but this is unrelated I suppose