Benchmark by='source' does not seem to work correctly

IBMB-MFP commented 1 month ago

Hi, I am trying to please reviewers and provide benchmark statistics for individual TFs.

According to decoupler-py's 'read the docs' page, this should be possible using the argument by='source' in the 'benchmark' function.

When I run benchmark with by='experiment', it runs fine and gives me the expected output matrix of benchmarking statistics.

When I run with by='source', the verbose output is identical:

Extracting inputs... Formating net... Removed 1 experiments without sources in net. Running 9 experiments for 5 unique sources. Running methods... 29194 features of mat are empty, they will be removed. Running mlm on mat with 9 samples and 17729 targets for 487 sources. Calculating metrics... Computing metrics... Done.

but I get an empty data frame returned, and I don't understand why.

Here is the code used to run this analysis - the files obs_limit.txt, bench_limit.txt are attached (they are limited subsets of the full benchmarking set for the purpose of the reproducible example), as is the GRN (CelEsT_GRN.txt).

#%%
import pandas as pd
import decoupler as dc
import os

from pathlib import Path

#%%

home_directory = os.path.expanduser("~")

working_directory = os.path.join(home_directory, "Cel_GRN_revisions")

os.chdir(working_directory)

output_directory = Path("output/benchmark_out")

output_directory.mkdir(parents = True, exist_ok = True)
#%%

benchRAPToR = pd.read_csv("bench_limit.txt",
                  sep = '\t',
                  index_col = 0)

#%%
obs1 = pd.read_csv("obs_limit.txt",
                  sep = '\t',
                     index_col=0,
                     encoding='unicode_escape')

#%%

mat = pd.read_table(os.path.join(home_directory, "CelEsT_GRN.txt"))

decouple_kws={
    'methods' : ['mlm'],
    'consensus': False,
}

#%%

notbysource_successful = dc.benchmark(benchRAPToR, obs1, mat, perturb = 'target_gseq', by = 'experiment', sign = -1, verbose = True, decouple_kws = decouple_kws)  

#%%

bysource_doesntwork = dc.benchmark(benchRAPToR, obs1, mat, by = 'source', perturb = 'target_gseq', sign = -1, verbose = True, decouple_kws = decouple_kws)

I am using decoupler 1.6.0 and python 3.8.18.

Python version 3.8.18
decoupler 1.6.0 (having trouble updating)

obs_limit.txt bench_limit.txt CelEsT_GRN.txt

PauBadiaM commented 1 month ago

Hi @IBMB-MFP,

This is caused because of the interaction of running by='source' and min_expr=5. It seems like none of your TFs have more than 5 perturbation experiments, that's why the benchmark does not run. By setting min_expr=3 you get results but keep in mind that this AUC has been built considering a ranking of just 3 elements which I don't know how valid this is. That is the reason why by default we enforce at least 5 experiments.

Another alternative is to use the new recall metric implemented in 1.7.0 which checks how many times the perturbed TF has been correctly predicted to be perturbed (sign consistent score and significant p-value). You can run it by source.

Sorry that the verbose output was confusing, I will update it to make it clearer what is going on under the hood, and good luck with your revisions! Let me know if you need anything else ;)

IBMB-MFP commented 3 weeks ago

Hi Pau,

Thanks as always for the swift reply.

This makes sense, I hadn't noticed this default argument in the background of the function.

I have updated to decoupler 1.8 and am trying to run the 'recall' metric but am quite confused about the output. As with auroc/auprc it is responsive to the min_exp argument - for the purposes of this I have set min_exp to 1, but what I go on to describe happens whatever min_exp is set to (just with more or less TFs).

I add metrics = ['auprc', 'recall] to the benchmark function call as below:

output = dc.benchmark(benchRAPToR, obs1, mat, metrics = ['auprc', 'recall'], by = 'source', min_exp = 1, perturb = 'target_gseq', sign = -1, verbose = True, decouple_kws = decouple_kws)

However, althought the AUPRC values take a continuous range of values for different TFs, the recall metric is always 1.... the output is attached.

Is there an alternative way of running the recall metric to serve as a useful alternative here?

recall_bysource.txt

PauBadiaM commented 3 days ago

Hi @IBMB-MFP,

Sorry for the late reply, I was on holidays. You can use the argument use_pval and set it to a desired significance threshold such as 0.05 in the dc.benchmark function. This will determine whether a TF has significant activity or not.

saezlab / decoupler-py

Benchmark by='source' does not seem to work correctly #149