mortazavilab / PyWGCNA

PyWGCNA is a Python package designed to do Weighted Gene Correlation Network analysis (WGCNA)
https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btad415/7218311
MIT License
209 stars 48 forks source link

Questions about one error in the analysis process #19

Closed HelloWorldLTY closed 1 year ago

HelloWorldLTY commented 1 year ago

Hi, when I run this code:

pyWGCNA_5xFAD.findModules()

I meet such error:

2553 k = Y.shape[0] 2554 if k == 0: -> 2555 raise ValueError("The number of observations cannot be determined on " 2556 "an empty distance matrix.") 2557 d = int(np.ceil(np.sqrt(k 2))) 2558 if (d (d - 1) / 2) != k:

ValueError: The number of observations cannot be determined on an empty distance matrix.

I can use IPA to find my module, so I wonder if there are any problems and how to solve them. Thanks a lot.

nargesr commented 1 year ago

Hi, can you send me the full error you got?

HelloWorldLTY commented 1 year ago

Hi, sure. Cell In [20], line 1 ----> 1 pyWGCNA_5xFAD.findModules()

File /gpfs/ysm/project/zhao/tl688/conda_envs/pyg/lib/python3.8/site-packages/PyWGCNA/wgcna.py:298, in WGCNA.findModules(self) 296 # Cluster module eigengenes 297 a = squareform(MEDiss, checks=False) --> 298 METree = WGCNA.hclust(a, method="average") 300 plt.figure(figsize=(max(20, round(MEDiss.shape[1] / 20)), 10), facecolor='white') 301 dendrogram(METree, color_threshold=self.MEDissThres, labels=MEDiss.columns, leaf_rotation=90, 302 leaf_font_size=8)

File /gpfs/ysm/project/zhao/tl688/conda_envs/pyg/lib/python3.8/site-packages/PyWGCNA/wgcna.py:715, in WGCNA.hclust(d, method) 712 if method == -1: 713 sys.exit("Ambiguous clustering method.") --> 715 dendrogram = linkage(d, method=method) 717 return dendrogram

File /gpfs/ysm/project/zhao/tl688/conda_envs/pyg/lib/python3.8/site-packages/scipy/cluster/hierarchy.py:1068, in linkage(y, method, metric, optimal_ordering) 1064 if not np.all(np.isfinite(y)): 1065 raise ValueError("The condensed distance matrix must contain only " 1066 "finite values.") -> 1068 n = int(distance.num_obs_y(y)) 1069 method_code = _LINKAGE_METHODS[method] 1071 if method == 'single':

File /gpfs/ysm/project/zhao/tl688/conda_envs/pyg/lib/python3.8/site-packages/scipy/spatial/distance.py:2555, in num_obs_y(Y) 2553 k = Y.shape[0] 2554 if k == 0: -> 2555 raise ValueError("The number of observations cannot be determined on " 2556 "an empty distance matrix.") 2557 d = int(np.ceil(np.sqrt(k 2))) 2558 if (d (d - 1) / 2) != k:

ValueError: The number of observations cannot be determined on an empty distance matrix.

nargesr commented 1 year ago

can you tell me what's the output of "pyWGCNA_5xFAD.MEs"? and also did you change any default background?

HelloWorldLTY commented 1 year ago

Hi, I did not modify the default background information. I can show you my codes:

geneExp = pathway_count pyWGCNA_5xFAD = PyWGCNA.WGCNA(name='5xFAD', species='human', geneExp=pd_adata_new, save=True)

pyWGCNA_5xFAD.geneExpr.to_df().head(5)

pyWGCNA_5xFAD.preprocess()

pyWGCNA_5xFAD.findModules()

I used a ipynb file to run the codes. image

There is not output of this data structure. Moreover, I used a single cell dataset rather than bulk seq dataset.

nargesr commented 1 year ago

oh running it in single-cell data! that's exciting! Can you send me your PyWGCNA object? my email address: nargesr@uci.edu

nargesr commented 1 year ago

Hi again.

I found out you have a negative value in your gene expression matrix which cause this error. so you should scale your data or remove those samples that have a negative value

HelloWorldLTY commented 1 year ago

Hi, so does this mean I cannot use sctransform's results to perform analysis. If so, I think I understand your meaning. Moreover, if I have my own co-expression or adj matrix, can I input this matrix into this software? Thanks a lot.

nargesr commented 1 year ago

Hi, Does your adjacency matrix contains a negative value? if not you should be able to calculate the TOM matrix and find modules.

But also there is this package called hdWGCNA which basically design to run WGCNA on single-cell data

vvvcgx commented 1 year ago

Output exceeds the size limit. Open the full output data in a text editor Run WGCNA... pickSoftThreshold: calculating connectivity for given powers... will use block size 1702 Power SFT.R.sq slope truncated R.sq mean(k) median(k) \ 0 1 0.039826 -0.214968 0.24769 9740.613689 13292.414405
1 2 0.036757 -0.216539 0.846205 7400.007128 9886.25946
2 3 0.057513 -0.242484 0.89834 6227.252172 7760.561177
3 4 0.093872 -0.276056 0.928821 5440.696317 6239.136807
4 5 0.109404 -0.27363 0.958988 4847.516021 5044.177561
5 6 0.113892 -0.273975 0.985024 4373.498046 4111.470357
6 7 0.131714 -0.279216 0.983708 3981.243252 3372.089766
7 8 0.160607 -0.289612 0.96955 3648.833109 2777.357355
8 9 0.185508 -0.300187 0.939614 3362.207677 2298.587033
9 10 0.234666 -0.319766 0.918025 3111.777337 1913.824266
10 11 0.285473 -0.336073 0.894188 2890.69295 1591.317341
11 13 0.406644 -0.369328 0.859255 2517.465985 1114.845279
12 15 0.514198 -0.401922 0.813351 2214.32117 792.419937
13 17 0.605442 -0.432969 0.788198 1963.447593 566.387572
14 19 0.677218 -0.46352 0.762395 1752.817118 407.385841

      max(k)  

0 15517.351971
1 13605.849772
2 12497.442739
3 11659.550278
... Done..

Going through the merge tree... ..cutHeight not given, setting it to 0.994 ===> 99% of the (truncated) height range in dendro. Output exceeds the size limit. Open the full output data in a text editor

IndexError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_691884\1651015276.py in ----> 1 py_FAD.findModules()

~\AppData\Roaming\Python\Python39\site-packages\PyWGCNA\wgcna.py in findModules(self, kwargs) 291 # dynamicMods = WGCNA.cutreeHybrid(dendro=self.geneTree, distM=dissTOM, deepSplit=2, pamRespectsDendro=False, 292 # minClusterSize=self.minModuleSize, kwargs) --> 293 dynamicMods = WGCNA.cutreeHybrid(dendro=self.geneTree, distM=dissTOM, deepSplit=2, pamRespectsDendro=False, 294 minClusterSize=self.minModuleSize, **kwargs) 295

~\AppData\Roaming\Python\Python39\site-packages\PyWGCNA\wgcna.py in cutreeHybrid(dendro, distM, cutHeight, minClusterSize, deepSplit, maxCoreScatter, minGap, maxAbsCoreScatter, minAbsGap, minSplitHeight, minAbsSplitHeight, externalBranchSplitFnc, nExternalSplits, minExternalSplit, externalSplitOptions, externalSplitFncNeedsDistance, assumeSimpleExternalSpecification, pamStage, pamRespectsDendro, useMedoids, maxPamDist, respectSmallClusters) 1437 Core = branch_singletons[large - 1][np.arange(coresize)] 1438 Core = Core.astype(int).tolist() -> 1439 LgAveDist = np.mean(distM.iloc[Core, Core].sum() / coresize - 1) 1440 else: 1441 LgAveDist = 0

d:\ProgramData\anaconda\lib\site-packages\pandas\core\indexing.py in getitem(self, key) 959 if self._is_scalar_access(key): 960 return self.obj._get_value(*key, takeable=self._takeable) --> 961 return self._getitem_tuple(key) 962 else: 963 # we by definition only have the 0th axis ... -> 1379 raise IndexError("positional indexers are out-of-bounds") 1380 else: 1381 raise ValueError(f"Can only index by location with a [{self._valid_types}]")

IndexError: positional indexers are out-of-bounds

wangjiawen2013 commented 1 year ago

hdWGCNA don't support python. So hope pyWGCNA can treat single cell datasets, because there is a lack of tool like WGCNA in python ecosystem of single cells.