mortazavilab / PyWGCNA

PyWGCNA is a Python package designed to do Weighted Gene Correlation Network analysis (WGCNA)
https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btad415/7218311
MIT License
214 stars 53 forks source link

Several errors when performing findModules steps individually #97

Open MRimoldi opened 6 months ago

MRimoldi commented 6 months ago

Hello! I am trying to customize my analysis, which means that I want to provide a different soft threshold to that automatically selected by the findModules function (hence, pickSoftThreshold).

Since it is not possible to pass the power as argument, I started playing around with the parameters of findModules and other functions so to tailor the analyses to my needs, but I encountered several errors. I am going to describe what I did and the error messages I had.

First error: if I try to set a different RsquaredCut threshold from the findModules function, I get the following error:

pyWGCNA_OSCC.findModules(kwargs_function={'pickSoftThreshold': { 'RsquaredCut': 0.8 }})
TypeError: PyWGCNA.wgcna.WGCNA.pickSoftThreshold() got multiple values for keyword argument 'RsquaredCut'

if I pass the RsquaredCut parameter when I create the WGCNA object, I can successfully adopt my preferred R^2 threshold, but the power plot still shows a red line on 0.9 instead of my provided R^2 threshold.

Anyway, I then started replicating all the steps of findModules so to calculate the TOM similarity matrix with the power I want (which is my end goal).

The first step would be to calculate the adjacency matrix. According to the documentation we can do that with the adjacency function, but I think it has the same name and it is called the same way as the adjacency layer.. so I cannot calculate it.

pyWGCNA_OSCC.adjacency(pyWGCNA_OSCC.datExpr.to_df(), 
                       adjacencyType='signed',
                       power=10
                      )
TypeError: 'NoneType' object is not callable

As I mentioned, pyWGCNA_OSCC.adjacency returns an adjacency matrix calculated by findModules - if I run it.

Finally, I tested also how to calculate the TOM similarity matrix from the adjacency matrix (the one that was calculated by the findModules function, since I could not create my own). And I get an error here as well:

pyWGCNA_OSCC.TOMsimilarity(pyWGCNA_OSCC.adjacency)
AttributeError: 'DataFrame' object has no attribute 'dtype'

I just want to stress that if I run pyWGCNA_OSCC.findModules()this works fine and creates the adjacency and the TOM matrixes.

Thanks for the help and for developing a python version of WGCNA! Martina

nargesr commented 6 months ago

Hi,

you can pass the list of powers which can be one single value. In your case, it would be something like this:

self.powers = [10]

for those arguments that we pass in the initial phase, you should change them in this way:

self.RsquaredCut = 0.8

about the plotting part, I will fix the plotting part in the next release. (I forgot to use the parameter for plotting part but the results should be fine and use the correct threshold)

foradjacency problem, I need to think a little bit more about what's the best way to solve this (i.e. change one of the name or something else) so it wouldn't be a problem moving forward.

Best, Narges

MRimoldi commented 6 months ago

Brilliant, thank you. I tried to set self.powers = [10] and indeed I get to the end of findModules with my TOM matrix. However, I had tried before with self.power = 10 and it did not work. So what is the different between powers and power?

I also found other mini errors and little things that could turn into enhancement suggestions. Would you prefer having separate issues for each of those or one issue listing all of them?

Thanks again! Martina

MRimoldi commented 6 months ago

Hi again,

I wanted to add that if I simply run pyWGCNA_OSCC.findModules() it works without errors. But if I want to provide a bigger block size, I get errors and modules are not calculated

pyWGCNA_OSCC.findModules(kwargs_function={'pickSoftThreshold': {'blockSize': 20000}})
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[40], line 2
      1 pyWGCNA_OSCC.powers = [6]
----> 2 pyWGCNA_OSCC.findModules(kwargs_function={'pickSoftThreshold': {'blockSize': 20000}})

File /pip/lib/python3.10/site-packages/PyWGCNA/wgcna.py:329, in WGCNA.findModules(self, kwargs_function)
    327 if 'cutreeHybrid' in list(kwargs_function.keys()):
    328     kwargs = kwargs_function['cutreeHybrid']
--> 329 dynamicMods = WGCNA.cutreeHybrid(dendro=self.geneTree, distM=dissTOM, minClusterSize=self.minModuleSize, **kwargs)
    331 # Convert numeric labels into colors
    332 kwargs = dict()

File /pip/lib/python3.10/site-packages/PyWGCNA/wgcna.py:1752, in WGCNA.cutreeHybrid(dendro, distM, cutHeight, minClusterSize, deepSplit, maxCoreScatter, minGap, maxAbsCoreScatter, minAbsGap, minSplitHeight, minAbsSplitHeight, externalBranchSplitFnc, nExternalSplits, minExternalSplit, externalSplitOptions, externalSplitFncNeedsDistance, assumeSimpleExternalSpecification, pamStage, pamRespectsDendro, useMedoids, maxPamDist, respectSmallClusters)
   1750 if pamRespectsDendro:
   1751     for sclust in SmallLabLevs[SmallLabLevs != 0]:
-> 1752         InCluster = list(range(nPoints))[SmallLabels == sclust]
   1753         onBr = pd.unique(onBranch[InCluster])
   1754         if len(onBr) > 1:

TypeError: only integer scalar arrays can be converted to a scalar index
nargesr commented 6 months ago

Hi,

powers is the list of powers that we test to have a scale-free network but power is the one that we selected between powers that pass our criteria.

could you send me the script you used + a version of pyWGCNA for the other errors you got?

if you have suggestions for enhancements like adding features or something like that please open a separate issue because I need to check whether I have the bandwidth to add it or not but if it's an error you can keep it in the same issue.

Best, Narges

MRimoldi commented 6 months ago

I am using PyWGCNA==2.0.4

here is the script:


adata_OSCC = ad.AnnData(expression_counts_filt)
adata_OSCC.obs_names = OSCC_metadata.index
adata_OSCC.obs = OSCC_metadata
adata_OSCC.var_names = gene_metadata_filt.index
adata_OSCC.var = gene_metadata_filt

adata_OSCC

pyWGCNA_OSCC = PyWGCNA.WGCNA(name='OSCC',
                             species='homo sapiens', 
                             anndata=adata_OSCC,
                             outputPath=otp_dir+"/", # we need to add '/'. otherwise it does not create the subfolder where you want
                             save=False,
                             networkType = 'signed',
                             TOMType = 'signed',
                             TPMcutoff = 1.5
                            )

pyWGCNA_OSCC.preprocess(show=True)

pyWGCNA_OSCC.pickSoftThreshold(pyWGCNA_OSCC.datExpr.to_df())

pyWGCNA_OSCC.powers = [5]
pyWGCNA_OSCC.findModules(kwargs_function={'pickSoftThreshold': {'blockSize': 20000}})
MRimoldi commented 6 months ago

I stand corrected, setting self.powers = [x] might not work as expected because the SFT.r.sq values (and all other statistics) are not the same as the one generated by the pickSoftThreshold function. for example: A) running: pyWGCNA_OSCC.pickSoftThreshold(pyWGCNA_OSCC.datExpr.to_df()) returns the following rows of the of the sft table for powers 5 and 6:

Power  SFT.R.sq     slope truncated R.sq      mean(k)    median(k)
...
5  0.912374 -1.284084       0.928879   201.599093    87.219334
6  0.909013 -1.244545        0.92533   141.545957    46.490382

B) running pyWGCNA_OSCC.pickSoftThreshold(pyWGCNA_OSCC.datExpr.to_df(), powerVector=[5,6]) returns the following sft table which has the same sft.r.sq values as before

Power  SFT.R.sq     slope truncated R.sq     mean(k)  median(k)  \\
5  0.912374 -1.284084       0.928879  201.599093  87.219334
6  0.909013 -1.244545        0.92533  141.545957  46.490382

C) running:

pyWGCNA_OSCC.powers = [5,6]
pyWGCNA_OSCC.findModules()

returns different statistics in the sft table:

Power  SFT.R.sq     slope truncated R.sq      mean(k)   median(k)  \\
5  0.602951 -1.639834       0.777942  1286.271455  1091.35202
6  0.785474  -1.74202       0.864566   876.628143  694.423832

Best, Martina

nargesr commented 6 months ago

on more questions, did you initiate/define your object each time you ran them?

MRimoldi commented 6 months ago

yes I did! and I redid it just now to be sure.