mortazavilab / PyWGCNA

PyWGCNA is a Python package designed to do Weighted Gene Correlation Network analysis (WGCNA)
https://academic.oup.com/bioinformatics/advance-article/doi/10.1093/bioinformatics/btad415/7218311
MIT License
209 stars 48 forks source link

analyseWGCNA subplot error #107

Closed Lachiemckbioinfo closed 2 months ago

Lachiemckbioinfo commented 3 months ago

Hi, Hoping I'm not missing something obvious. I've been persistently getting an error when running analyseWGCNA(), and happens every time I run it. I've tried both with my data, and with example data in case something was going wrong in the data.

Error traceback

Traceback (most recent call last):
  File "/home/ljm028/WGCNA/WGCNA.py", line 45, in <module>
    testpyw.analyseWGCNA()
  File "/home/ljm028/WGCNA/venv/lib/python3.9/site-packages/PyWGCNA/wgcna.py", line 452, in analyseWGCNA
    self.plotModuleEigenGene(module, metadata, show=show)
  File "/home/ljm028/WGCNA/venv/lib/python3.9/site-packages/PyWGCNA/wgcna.py", line 2977, in plotModuleEigenGene
    axs_legend = gridspec.GridSpecFromSubplotSpec(len(metadata), 1, subplot_spec=ax_legend,
  File "/home/ljm028/WGCNA/venv/lib/python3.9/site-packages/matplotlib/gridspec.py", line 490, in __init__
    raise TypeError(
TypeError: subplot_spec must be type SubplotSpec, usually from GridSpec, or axes.get_subplotspec.

My code

import PyWGCNA as pyw
# Can read WGCNA objects

#testpyw = pyw.readWGCNA('filename')
geneExp = 'data/pyWGCNA_example_test.csv'
testpyw = pyw.WGCNA(name = 'WGCNA',
        species = 'Asparagopsis taxiformis',
        geneExpPath=geneExp,
        outputPath='outdir/', #In order for the output path to act as a out directory, end string with '/'
        save=True)

#print(testpyw.geneExpr.to_df().head(5))

#print('Starting preprocessing')
#testpyw.preprocess()

#testpyw.findModules()

testpyw.runWGCNA() # Single command to do both .preprocess() and .findModules()

testpyw.updateSampleInfo(path='data/sample_info.csv', sep=',')

# Add colour for metadata
testpyw.setMetadataColor('Subset', {'Rini': 'red',
                                    'Jess': 'blue',
                                    'Tom': 'yellow'})
testpyw.setMetadataColor('Treatment', {'Standard': 'thistle',
                                       '25 umol': 'plum',
                                       '0.5g': 'violet'})

# Update gene information using geneList
geneList = pyw.getGeneList(dataset = 'mmusculus_gene_ensembl',
                           attributes=['ensembl_gene_id',
                                       'external_gene_name',
                                       'gene_biotype'],
                                       maps=['gene_id', 'gene_name', 'gene_biotype'])
testpyw.updateGeneInfo(geneList)

testpyw.saveWGCNA()

# Set figure output format
#testpyw.figureType = 'png'

print(testpyw.datExpr.var.head(5))
testpyw.analyseWGCNA()

Program versions

I'll list the program versions of relevant programs installed in the environment. It's not all of them, so if you need to check another program, I can provide the data. anndata 0.10.7 asttokens 2.4.1 biomart 0.9.2 matplotlib 3.9.0 matplotlib-inline 0.1.7 networkx 3.2.1 numpy 2.0.0 PyWGCNA 2.0.4 pandas 2.2.2 reactome2py 3.0.0 requests 2.32.3 scikit-learn 1.5.0 scipy 1.13.1 seaborn 0.13.2

Thanks, Lachlan

nargesr commented 3 months ago

Hi @Lachiemckbioinfo,

Can you tell me what the output of testpyw.datExpr.var and testpyw.datExpr.obs?

First few rows should be fine but I need to see the all columns.

Best, Narges

Lachiemckbioinfo commented 3 months ago

This is using the expressionList data.

datExpr.var:

                    dynamicColors moduleColors  moduleLabels gene_name    gene_biotype
ENSMUSG00000000028       darkred      darkred           4.0     Cdc45  protein_coding
ENSMUSG00000000049       darkred      darkred           4.0      Apoh  protein_coding
ENSMUSG00000000056      darkgrey     darkgrey           3.0      Narf  protein_coding
ENSMUSG00000000058         coral        coral           2.0      Cav2  protein_coding
ENSMUSG00000000078     gainsboro    gainsboro           7.0      Klf6  protein_coding
ENSMUSG00000000085      darkgrey     darkgrey           3.0     Scmh1  protein_coding

datExpr.obs:

                              Age  Tissue     Sex   Genotype
sample_id
X4mo_cortex_F_5xFADHEMI_430  4mon  Cortex  Female  5xFADHEMI
X4mo_cortex_F_5xFADHEMI_431  4mon  Cortex  Female  5xFADHEMI
X4mo_cortex_F_5xFADHEMI_433  4mon  Cortex  Female  5xFADHEMI
X4mo_cortex_F_5xFADHEMI_434  4mon  Cortex  Female  5xFADHEMI
X4mo_cortex_F_5xFADHEMI_511  4mon  Cortex  Female  5xFADHEMI
X4mo_cortex_F_5xFADWT_330    4mon  Cortex  Female    5xFADWT

Thanks, Lachlan

nargesr commented 3 months ago

this is coming from the example dataset, right? if yes, the script you provided seems to be something different since the column names you used (Subset, Treatment) do not exist in your dataset (datExpr.obs).

Could you provide the script, self.datExpr.var, self.datExpr.obs, and the error you got for either the example dataset or your dataset?

Thanks, Narges

Lachiemckbioinfo commented 3 months ago

Ah, sorry, mixed up my scripts.

This is the script running with the example data:

import PyWGCNA as pyw

# Add function for custom printing with linebreaks
def customprint(*args):
    linebreak = "\n------------------------------\n"
    print(f"{linebreak}")
    for arg in args:
        print(arg)
    print(f"{linebreak}")

# Can read WGCNA objects

#testpyw = pyw.readWGCNA('filename')
geneExp = 'data/expressionList.csv'
testpyw = pyw.WGCNA(name = 'WGCNA',
        species = 'Mus musculus',
        geneExpPath=geneExp,
        outputPath='outdir/', #In order for the output path to act as a out directory, end string with '/'
        save=True)

#print('Starting preprocessing')
#testpyw.preprocess()

#testpyw.findModules()

testpyw.runWGCNA() # Single command to do both .preprocess() and .findModules()

testpyw.updateSampleInfo(path='data/sampleInfo.csv', sep=',')

# Add colour for metadata
testpyw.setMetadataColor('Sex', {'Female': 'green',
                                 'Male': 'yellow'})
testpyw.setMetadataColor('Genotype', {'5xFADWT': 'darkviolet',
                                      '5xFADHEMI': 'deeppink'})
testpyw.setMetadataColor('Tissue', {'Hippocampus': 'red',
                                    'Cortex': 'blue'})
testpyw.setMetadataColor('Age', {'4mon': 'thistle',
                                 '8mon': 'plum',
                                 '12mon': 'violet',
                                 '18mon': 'purple'})

# Update gene information using geneList
geneList = pyw.getGeneList(dataset = 'mmusculus_gene_ensembl',
                           attributes=['ensembl_gene_id',
                                       'external_gene_name',
                                       'gene_biotype'],
                                       maps=['gene_id', 'gene_name', 'gene_biotype'])
testpyw.updateGeneInfo(geneList)

testpyw.saveWGCNA()

# Set figure output format
#testpyw.figureType = 'png'

customprint(f"datExpr.var:\n {testpyw.datExpr.var.head(6)}", f"datExpr.obs:\n{testpyw.datExpr.obs.head(6)}")

testpyw.analyseWGCNA()

# Function to group GO, KEGG and Reactome annotation and perform as one block
def funcAnnot():
    # GO Annotation
    gene_set_library = ["GO_Biological_Process_2021", "GO_Cellular_Component_2021", "GO_Molecular_Function_2021"]
    testpyw.functional_enrichment_analysis(type="GO",
                                        moduleName='lightgrey',
                                        sets=gene_set_library,
                                        p_value=0.05,
                                        file_name="GO_coral_2021")
    # KEGG Annotation
    KEGG_set_library = ["KEGG_2016"]
    testpyw.functional_enrichment_analysis(type='KEGG',
                                        moduleName='lightgrey',
                                        sets=KEGG_set_library,
                                        p_value=0.05)
    # Reactome annotation
    testpyw.functional_enrichment_analysis(type='REACTOME',
                                        moduleName='lightgrey',
                                        p_value=0.05)
#funcAnnot()

def modulenetwork():
    modules = testpyw.datExpr.var.moduleColors.unique().tolist()
    print(f"Modules: {modules}")
    testpyw.CoexpressionModulePlot(modules=modules, numGenes=10, numConnections=100, minTOM=0)
#modulenetwork()
nargesr commented 3 months ago

Hi @Lachiemckbioinfo

Unfortunately, I wasn't able to reproduce your error using the example datasets and script you provided.

I did update requirements.txt so you can compare the versioning.

The only package that I thought might be a problem was matplotlib but when I checked the documentation of gridspec.GridSpecFromSubplotSpec() for both versions of 3.8 and 3.9, I didn't detect any changes.

According to the error, it seems you have Python 3.9 but to install PyWGCNA, Python version 3.10 or greater is required (ref).

I would suggest making a new environment with Python 3.10, installing PyWGCNA, and then checking the version of dependencies. Hopefully, that will fix your problem.

Lachiemckbioinfo commented 2 months ago

Hi, Managed to get it solved by taking it to a virtual machine. For context, I was running it on a university Linux server with Python 3.9 pre-installed, but was doing Python versioning using a Conda environment and pip installs in a Python virtual environment, so there may have been conflicting versions going on.

Thanks, Lachlan

Lachiemckbioinfo commented 2 months ago

Hi, I had the same issue again, running this in a Conda environment with matplotlib 3.9.1. However, reinstalling matplotlib to 3.8.2 resolved the issue.

nargesr commented 2 months ago

Hi @Lachiemckbioinfo

which version of Python? also, can you send me the version of all the packages you are using? also, did you get the same error?

Lachiemckbioinfo commented 2 months ago

Yes, it was the exact same error. Here are the details:

General details

Python version: 3.10.14 Environment: Conda environment. Command: conda create --name wgtest python=3.10 WGCNA install method: pip install PyWGCNA Resolved with: pip install 'matplotlib==3.8.2' --force-reinstall

Error traceback

Traceback (most recent call last):
  File "/home/ljm028/WGCNA/AtaSC/No_Nulls/take3/test/WGCNA_group.py", line 161, in <module>
    run_WGCNA('Culturing', 'data/no_Nulls_expressionData_Culturing.csv', samplefile, 'Culturing')
  File "/home/ljm028/WGCNA/AtaSC/No_Nulls/take3/test/WGCNA_group.py", line 79, in run_WGCNA
    pyw.analyseWGCNA()
  File "/home/ljm028/.conda/envs/wgtest/lib/python3.10/site-packages/PyWGCNA/wgcna.py", line 452, in analyseWGCNA
    self.plotModuleEigenGene(module, metadata, show=show)
  File "/home/ljm028/.conda/envs/wgtest/lib/python3.10/site-packages/PyWGCNA/wgcna.py", line 2980, in plotModuleEigenGene
    axs_legend = gridspec.GridSpecFromSubplotSpec(len(metadata), 1, subplot_spec=ax_legend,
  File "/home/ljm028/.conda/envs/wgtest/lib/python3.10/site-packages/matplotlib/gridspec.py", line 490, in __init__
    raise TypeError(
TypeError: subplot_spec must be type SubplotSpec, usually from GridSpec, or axes.get_subplotspec.

Installed programs (not working)

packages in environment at /home/ljm028/.conda/envs/wgtest:

Name Version Build Channel _libgcc_mutex 0.1 main _openmp_mutex 5.1 1_gnu anndata 0.10.8 pypi_0 pypi array-api-compat 1.7.1 pypi_0 pypi asttokens 2.4.1 pypi_0 pypi biomart 0.9.2 pypi_0 pypi bzip2 1.0.8 h5eee18b_6 ca-certificates 2024.7.2 h06a4308_0 certifi 2024.7.4 pypi_0 pypi charset-normalizer 3.3.2 pypi_0 pypi contourpy 1.2.1 pypi_0 pypi cycler 0.12.1 pypi_0 pypi decorator 5.1.1 pypi_0 pypi exceptiongroup 1.2.2 pypi_0 pypi executing 2.0.1 pypi_0 pypi fonttools 4.53.1 pypi_0 pypi gseapy 1.1.3 pypi_0 pypi h5py 3.11.0 pypi_0 pypi idna 3.7 pypi_0 pypi ipython 8.26.0 pypi_0 pypi jedi 0.19.1 pypi_0 pypi jinja2 3.1.4 pypi_0 pypi joblib 1.4.2 pypi_0 pypi json5 0.9.25 pypi_0 pypi jsonpickle 3.2.2 pypi_0 pypi kiwisolver 1.4.5 pypi_0 pypi ld_impl_linux-64 2.38 h1181459_1 libffi 3.4.4 h6a678d5_1 libgcc-ng 11.2.0 h1234567_1 libgomp 11.2.0 h1234567_1 libstdcxx-ng 11.2.0 h1234567_1 libuuid 1.41.5 h5eee18b_0 markupsafe 2.1.5 pypi_0 pypi matplotlib 3.9.1 pypi_0 pypi matplotlib-inline 0.1.7 pypi_0 pypi memoir 0.0.3 pypi_0 pypi natsort 8.4.0 pypi_0 pypi ncurses 6.4 h6a678d5_0 networkx 3.3 pypi_0 pypi numpy 2.0.0 pypi_0 pypi openssl 3.0.14 h5eee18b_0 packaging 24.1 pypi_0 pypi pandas 2.2.2 pypi_0 pypi parso 0.8.4 pypi_0 pypi patsy 0.5.6 pypi_0 pypi pexpect 4.9.0 pypi_0 pypi pillow 10.4.0 pypi_0 pypi pip 24.0 py310h06a4308_0 prompt-toolkit 3.0.47 pypi_0 pypi psutil 6.0.0 pypi_0 pypi ptyprocess 0.7.0 pypi_0 pypi pure-eval 0.2.2 pypi_0 pypi pygments 2.18.0 pypi_0 pypi pyparsing 3.1.2 pypi_0 pypi python 3.10.14 h955ad1f_1 python-dateutil 2.9.0.post0 pypi_0 pypi pytz 2024.1 pypi_0 pypi pyvis 0.3.1 pypi_0 pypi pywgcna 2.0.5 pypi_0 pypi reactome2py 3.0.0 pypi_0 pypi readline 8.2 h5eee18b_0 reprit 0.9.0 pypi_0 pypi requests 2.32.3 pypi_0 pypi rsrc 0.1.3 pypi_0 pypi scikit-learn 1.5.1 pypi_0 pypi scipy 1.14.0 pypi_0 pypi seaborn 0.13.2 pypi_0 pypi setuptools 69.5.1 py310h06a4308_0 six 1.16.0 pypi_0 pypi sqlite 3.45.3 h5eee18b_0 stack-data 0.6.3 pypi_0 pypi statsmodels 0.14.2 pypi_0 pypi threadpoolctl 3.5.0 pypi_0 pypi tk 8.6.14 h39e8969_0 traitlets 5.14.3 pypi_0 pypi typing-extensions 4.12.2 pypi_0 pypi tzdata 2024.1 pypi_0 pypi urllib3 2.2.2 pypi_0 pypi wcwidth 0.2.13 pypi_0 pypi wheel 0.43.0 py310h06a4308_0 xz 5.4.6 h5eee18b_1 zlib 1.2.13 h5eee18b_1

Installed programs (when working)

Name Version Build Channel _libgcc_mutex 0.1 main _openmp_mutex 5.1 1_gnu anndata 0.10.8 pypi_0 pypi array-api-compat 1.7.1 pypi_0 pypi asttokens 2.4.1 pypi_0 pypi biomart 0.9.2 pypi_0 pypi bzip2 1.0.8 h5eee18b_6 ca-certificates 2024.7.2 h06a4308_0 certifi 2024.7.4 pypi_0 pypi charset-normalizer 3.3.2 pypi_0 pypi contourpy 1.2.1 pypi_0 pypi cycler 0.12.1 pypi_0 pypi decorator 5.1.1 pypi_0 pypi exceptiongroup 1.2.2 pypi_0 pypi executing 2.0.1 pypi_0 pypi fonttools 4.53.1 pypi_0 pypi gseapy 1.1.3 pypi_0 pypi h5py 3.11.0 pypi_0 pypi idna 3.7 pypi_0 pypi ipython 8.26.0 pypi_0 pypi jedi 0.19.1 pypi_0 pypi jinja2 3.1.4 pypi_0 pypi joblib 1.4.2 pypi_0 pypi json5 0.9.25 pypi_0 pypi jsonpickle 3.2.2 pypi_0 pypi kiwisolver 1.4.5 pypi_0 pypi ld_impl_linux-64 2.38 h1181459_1 libffi 3.4.4 h6a678d5_1 libgcc-ng 11.2.0 h1234567_1 libgomp 11.2.0 h1234567_1 libstdcxx-ng 11.2.0 h1234567_1 libuuid 1.41.5 h5eee18b_0 markupsafe 2.1.5 pypi_0 pypi matplotlib 3.8.2 pypi_0 pypi matplotlib-inline 0.1.7 pypi_0 pypi memoir 0.0.3 pypi_0 pypi natsort 8.4.0 pypi_0 pypi ncurses 6.4 h6a678d5_0 networkx 3.3 pypi_0 pypi numpy 1.26.4 pypi_0 pypi openssl 3.0.14 h5eee18b_0 packaging 24.1 pypi_0 pypi pandas 2.2.2 pypi_0 pypi parso 0.8.4 pypi_0 pypi patsy 0.5.6 pypi_0 pypi pexpect 4.9.0 pypi_0 pypi pillow 10.4.0 pypi_0 pypi pip 24.0 py310h06a4308_0 prompt-toolkit 3.0.47 pypi_0 pypi psutil 6.0.0 pypi_0 pypi ptyprocess 0.7.0 pypi_0 pypi pure-eval 0.2.2 pypi_0 pypi pygments 2.18.0 pypi_0 pypi pyparsing 3.1.2 pypi_0 pypi python 3.10.14 h955ad1f_1 python-dateutil 2.9.0.post0 pypi_0 pypi pytz 2024.1 pypi_0 pypi pyvis 0.3.1 pypi_0 pypi pywgcna 2.0.4 pypi_0 pypi reactome2py 3.0.0 pypi_0 pypi readline 8.2 h5eee18b_0 reprit 0.9.0 pypi_0 pypi requests 2.32.3 pypi_0 pypi rsrc 0.1.3 pypi_0 pypi scikit-learn 1.5.1 pypi_0 pypi scipy 1.14.0 pypi_0 pypi seaborn 0.13.2 pypi_0 pypi setuptools 69.5.1 py310h06a4308_0 six 1.16.0 pypi_0 pypi sqlite 3.45.3 h5eee18b_0 stack-data 0.6.3 pypi_0 pypi statsmodels 0.14.2 pypi_0 pypi threadpoolctl 3.5.0 pypi_0 pypi tk 8.6.14 h39e8969_0 traitlets 5.14.3 pypi_0 pypi typing-extensions 4.12.2 pypi_0 pypi tzdata 2024.1 pypi_0 pypi urllib3 2.2.2 pypi_0 pypi wcwidth 0.2.13 pypi_0 pypi wheel 0.43.0 py310h06a4308_0 xz 5.4.6 h5eee18b_1 zlib 1.2.13 h5eee18b_1

Script used

import PyWGCNA
import os
import pandas as pd 
import itertools 

# Add function for custom printing with linebreaks. Just for making debugging pretty.
def customprint(*args):
    linebreak = "\n------------------------------\n"
    print(f"{linebreak}")
    for arg in args:
        print(arg)
    print(f"{linebreak}")

def run_WGCNA(runname, infile, sampleinfo, testset, speciesname='Asparagopsis taxiformis'):
    customprint(f"Starting WGCNA run for {runname}")  
    geneExp = infile
    outdir = f'output_{runname}'
    pyw = PyWGCNA.WGCNA(name = runname,
                        species = speciesname,
                        geneExpPath = geneExp,
                        outputPath = f'{outdir}/',
                        save=True)

    # Single command to do both .preprocess() and .findModules()
    pyw.preprocess()
    pyw.findModules()
    #pyw.runWGCNA()
    # Update sample information using the sampleInfo csv
    pyw.updateSampleInfo(path=sampleinfo, sep=',')
    # Add colour for metadata. WAY TOO MANY COLOURS - AAARGH!
    # Lets hope that having one set that goes across different runs works fine

    pyw.setMetadataColor('Culture', {'Cultured': 'lightgrey',
                                        'Wild': 'dimgrey'})

   # Removed unnecessary code details here

    pyw.analyseWGCNA()
    def save_dataframes():
        datExpr_var = os.path.join(outdir, 'datExpr.var.csv')
        datExpr_obs = os.path.join(outdir, "datExpr.obs.csv")
        geneExpr = os.path.join(outdir, "geneExpr.csv")
        datExpr_var_df = pyw.datExpr.var
        datExpr_var_df.to_csv(datExpr_var)
        try:
            datExpr_obs_df = pyw.datExpr.obs.to_df()
            datExpr_obs_df.to_csv(datExpr_obs)
        except:
            datExpr_obs_df = pyw.datExpr.obs
            datExpr_obs_df.to_csv(datExpr_obs)
        try:
            geneExpr_df = pyw.datExpr.var.to_df()
            geneExpr_df.to_csv(geneExpr)
        except:
            geneExpr_df = pyw.datExpr.var
            geneExpr_df.to_csv(geneExpr)

        # Soft power
        sft = pyw.sft
        sft.to_csv(os.path.join(outdir, 'soft_power.csv'))

        # Adjacency matrix
        adj = pyw.adjacency
        adj.to_csv(os.path.join(outdir, 'adjacency.csv'))

        # Topological overlap matrix
        tom = pyw.TOM
        tom.to_csv(os.path.join(outdir, 'topological_overlap_matrix.csv'))

    try:
        save_dataframes()
    except:
        customprint("Save_dataframes broke")

    def modulelist():
        modules = pyw.datExpr.var.moduleColors.unique().tolist()
        #print(f"Modules: {modules}")
        with open(os.path.join(outdir, "modules.txt"), "w") as mods:
            mods.write("----------\nModules\n----------\n")
            for module in modules:
                mods.write(f"{module}\n")
        return modules
    modules = modulelist()

    # Network analysis. This will generate a HTML document.
    try:
        pyw.CoexpressionModulePlot(modules=modules, numGenes=10, numConnections=200, minTOM=0, file_name=f"network.html")
    except:
        customprint(f"Failed coexpression module plot")

    hubs = []
    hub_reps = 50
    def modulehubs(reps):
        for module in modules:
            hub = pyw.top_n_hub_genes(moduleName=module, n=reps)
            hub.to_csv(os.path.join(outdir, f"top_{reps}_hub_genes_{module}.csv"))
            hubs.append(hub)
    try:
        modulehubs(hub_reps)
    except:
        customprint(f"Failed modulehubs")
    all_hubs = pd.concat(hubs)
    all_hubs.to_csv(os.path.join(outdir, f"top_{hub_reps}_hubs_all.csv"))
    # Save the run as the name defined in runname
    pyw.saveWGCNA()
    customprint(f"Finished WGCNA run for {runname}")

# run_WGCNA('runname', 'expressionData_{something}.csv', 'data/sampleInfoSplit.csv', 'Culturing/Temperature/Density/Light/Nutrient')
samplefile = 'data/sampleInfoSplit.csv'
run_WGCNA('Culturing', 'data/no_Nulls_expressionData_Culturing.csv', samplefile, 'Culturing')
run_WGCNA('Density', 'data/no_Nulls_expressionData_Density.csv', samplefile, 'Density')
# ... more runs here

Edits

Edited - just removed some unnecessary details from the code.

nargesr commented 2 months ago

Hi @Lachiemckbioinfo

I update the package to be compatible with the latest version of matplotlib.

I will release a new version by the end of this week and you can test it out again.

Thank you for your help.

nargesr commented 2 months ago

Hi @Lachiemckbioinfo

Please install the latest version and let me know if you encounter any issues.