saezlab / decoupler-py

Python package to perform enrichment analysis from omics data.
https://decoupler-py.readthedocs.io/
GNU General Public License v3.0
168 stars 25 forks source link

Error in `run_ulm`: No Sources with More than `min_n=2` Targets Despite Matrix-Network Compatibility #158

Closed victorsanchezarevalo closed 1 month ago

victorsanchezarevalo commented 1 month ago

Describe the bug
I am encountering an error when using Decoupler's run_ulm function. The error states that no sources have more than min_n=2 targets, even though I have reduced the min_n parameter, and my dataset should have sufficient shared targets between the matrix (mat) and the network (collectri).

Error:

File ~/miniforge3/envs/decoupler/lib/python3.10/site-packages/spyder_kernels/customize/utils.py:209 in exec_encapsulate_locals
  exec_fun(compile(code_ast, filename, "exec"), globals)

File ~/Documentos/Mis_analisis/Ester_Martin/12_decoupler.py:225
  tf_acts, tf_pvals = dc.run_ulm(mat=mat, net=collectri, verbose=True, min_n=2)

File ~/miniforge3/envs/decoupler/lib/python3.10/site-packages/decoupler/method_ulm.py:108 in run_ulm
  net = filt_min_n(c, net, min_n=min_n)

File ~/miniforge3/envs/decoupler/lib/python3.10/site-packages/decoupler/pre.py:146 in filt_min_n
  raise ValueError("""No sources with more than min_n={0} targets. Make sure mat and net have shared target features or
ValueError: No sources with more than min_n=2 targets. Make sure mat and net have shared target features or
  reduce the number assigned to min_n

To Reproduce
Steps to reproduce the behavior:

  1. Install decoupler in a clean Python environment.
  2. Use the run_ulm function with the following setup:
    • Input matrix (mat): Gene expression matrix with n_genes x n_samples.
    • Regulatory network (collectri): A list of transcription factors and their target genes.
  3. Set min_n=2 to ensure that there are at least 2 targets per transcription factor.
  4. Run the analysis and observe the error when filt_min_n fails to find sufficient shared targets between the matrix and network.

If needed, I can provide a subset of the data that triggers the error for testing purposes.

Expected behavior
I expected the run_ulm function to return transcription factor activities and p-values when using the provided gene expression matrix (mat) and regulatory network (collectri), as there should be sufficient shared target genes between the two.

System

Additional context
I have verified that my matrix contains valid gene names and is compatible with the regulatory network. Despite lowering min_n to 1, the issue persists. This error seems to indicate a mismatch between the features in the matrix and the network, but these have been checked for consistency.

PauBadiaM commented 1 month ago

Hi @victorsanchezarevalo, decoupler follows the observations x features convention (commonly used in Python), rather than the features x observations convention (more typical in R). You can transpose your matrix to match this format, and it should work. Feel free to reach out if you have any further questions!

victorsanchezarevalo commented 1 month ago

Hi,

I’m encountering an issue when running ULM analysis with decoupler. I have already transposed my expression matrix as recommended (observations x features), but I am still getting the following error related to the min_n=2 parameter:

Code used:

# Preparing matrix and transposing
mat = results_df[['stat']].T.rename(index={'stat': 'treatment.vs.control'}).T

# Transposing matrix so genes are in rows
print(f"New mat shape: {mat.shape}")

# Assigning gene names from subset_adata.var['gene_name']
mat.index = subset_adata.var['gene_name'].values

# Checking the first 10 gene names
print(mat.index[:10])

# Retrieving CollecTRI gene regulatory network
settings.setup(curl_timeout=1200)
os.system('rm -rf ~/.cache/omnipathdb/*')
os.system('rm -rf ~/.cache/pypath/*')

collectri = dc.get_collectri(organism='mouse', split_complexes=False)

# Running ULM analysis with min_n=2
tf_acts, tf_pvals = dc.run_ulm(mat=mat, net=collectri, verbose=True, min_n=2)

# Checking results
print(tf_acts.head())
print(tf_pvals.head())

Error message:

ValueError: No sources with more than min_n=2 targets. Make sure mat and net have shared target features or reduce the number assigned to min_n

I have verified that the matrix has been transposed and gene names have been correctly assigned, but the issue persists. It seems that no transcription factors have more than 2 shared targets, even though min_n=2 is a reasonable threshold in this context.

Any help or suggestions would be appreciated!

Thank you!

PauBadiaM commented 1 month ago

Could you show me the head of your input mat?

mat.head()
victorsanchezarevalo commented 1 month ago
mat.head()
Out[20]: 
         treatment.vs.control
Sox17               -0.252628
Gm15452              0.320179
Gm26983              0.747064
Gm6187              -0.106062
Gm6119              -0.746242
PauBadiaM commented 1 month ago

Hi @victorsanchezarevalo,

As you show in your console output you have one observation (one contrast) and multiple genes (n). So, your matrix has wrong format features x observations (n, 1), not the correct observations x features (1, n). Transpose it again and it should be fine.

victorsanchezarevalo commented 1 month ago

Thanks! Now works perfectly!