Closed hoholee closed 3 months ago
Hi, v1.0.3 is in the main branch. We may have updated the test data. Can you install from the main branch and run the tests again?
Same error with v1.0.3:
$ python -m pytest tests/test_CLI.py -p no:warnings
============================================================================================================ test session starts =============================================================================================================
platform linux -- Python 3.12.2, pytest-8.1.1, pluggy-1.4.0
rootdir: /home/jul307/software/scDRS
configfile: pyproject.toml
collected 3 items
tests/test_CLI.py F.. [100%]
================================================================================================================== FAILURES ==================================================================================================================
____________________________________________________________________________________________________________ test_score_cell_cli _____________________________________________________________________________________________________________
def test_score_cell_cli():
"""
Test CLI `scdrs compute-score`
"""
# Load toy data
ROOT_DIR = scdrs.__path__[0]
H5AD_FILE = os.path.join(ROOT_DIR, "data/toydata_mouse.h5ad")
COV_FILE = os.path.join(ROOT_DIR, "data/toydata_mouse.cov")
assert os.path.exists(H5AD_FILE), "built-in data toydata_mouse.h5ad missing"
assert os.path.exists(COV_FILE), "built-in data toydata_mouse.cov missing"
tmp_dir = tempfile.TemporaryDirectory()
tmp_dir_path = tmp_dir.name
dict_df_score = {}
for gs_species in ["human", "mouse"]:
gs_file = os.path.join(ROOT_DIR, f"data/toydata_{gs_species}.gs")
# call compute_score.py
cmds = [
f"scdrs compute-score",
f"--h5ad_file {H5AD_FILE}",
"--h5ad_species mouse",
f"--gs_file {gs_file}",
f"--gs_species {gs_species}",
f"--cov_file {COV_FILE}",
"--ctrl_match_opt mean_var",
"--n_ctrl 20",
"--flag_filter_data False",
"--weight_opt vs",
"--flag_raw_count False",
"--flag_return_ctrl_raw_score False",
"--flag_return_ctrl_norm_score False",
f"--out_folder {tmp_dir_path}",
]
subprocess.check_call(" ".join(cmds), shell=True)
dict_df_score[gs_species] = pd.read_csv(
os.path.join(tmp_dir_path, f"toydata_gs_{gs_species}.score.gz"),
sep="\t",
index_col=0,
)
# consistency between human and mouse
assert np.all(dict_df_score["mouse"].pval == dict_df_score["human"].pval)
df_res = dict_df_score["mouse"]
REF_COV_FILE = os.path.join(
ROOT_DIR, "data/toydata_gs_mouse.ref_Ctrl20_CovConstCovariate.score.gz"
)
df_ref_res = pd.read_csv(REF_COV_FILE, sep="\t", index_col=0)
> compare_score_file(df_res, df_ref_res)
tests/test_CLI.py:58:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
df_res = raw_score norm_score mc_pval pval nlog10_pval zscore
index ...00 -10.000000
J10_B003899_S130.mus-7-0-1 4.460493 -1.627243 1.000000 0.956739 0.019207 -1.714034
df_res_ref = raw_score norm_score mc_pval pval nlog10_pval zscore
index ...00 -10.000000
J10_B003899_S130.mus-7-0-1 4.460493 -2.305674 1.000000 0.991680 0.003628 -2.394591
def compare_score_file(df_res, df_res_ref):
"""
Compare df_res
"""
col_list = ["raw_score", "norm_score", "mc_pval", "pval"]
for col in col_list:
v_ = df_res[col].values
v_ref = df_res_ref[col].values
err_msg = "Inconsistent values: {}\n".format(col)
err_msg += "|{:^15}|{:^15}|{:^15}|{:^15}|\n".format(
"OBS", "REF", "DIF", "REL_DIF"
)
for i in range(v_.shape[0]):
err_msg += "|{:^15.3e}|{:^15.3e}|{:^15.3e}|{:^15.3e}|\n".format(
v_[i],
v_ref[i],
v_[i] - v_ref[i],
np.absolute((v_[i] - v_ref[i]) / v_ref[i]),
)
> assert np.allclose(v_, v_ref, rtol=1e-2, equal_nan=True), err_msg
E AssertionError: Inconsistent values: norm_score
E | OBS | REF | DIF | REL_DIF |
E | 4.445e+00 | 6.326e+00 | -1.881e+00 | 2.973e-01 |
E | 6.038e+00 | 5.916e+00 | 1.216e-01 | 2.056e-02 |
E | 4.697e+00 | 5.552e+00 | -8.552e-01 | 1.540e-01 |
E | 5.186e+00 | 7.299e+00 | -2.112e+00 | 2.894e-01 |
E | 6.072e+00 | 5.779e+00 | 2.927e-01 | 5.065e-02 |
E | -6.976e-01 | -5.614e-01 | -1.362e-01 | 2.427e-01 |
E | -1.192e+00 | -1.582e+00 | 3.897e-01 | 2.463e-01 |
E | -2.219e+00 | -2.312e+00 | 9.325e-02 | 4.033e-02 |
E | 1.216e+00 | 1.157e+00 | 5.952e-02 | 5.146e-02 |
E | -4.155e+00 | -3.166e+00 | -9.896e-01 | 3.126e-01 |
E | 2.262e+00 | 1.505e+00 | 7.576e-01 | 5.035e-01 |
E | -2.240e+00 | -3.798e+00 | 1.558e+00 | 4.102e-01 |
E | 7.692e-01 | 1.052e+00 | -2.824e-01 | 2.686e-01 |
E | 2.888e-01 | -1.237e-01 | 4.126e-01 | 3.334e+00 |
E | -4.752e-01 | -8.706e-01 | 3.954e-01 | 4.541e-01 |
E | -3.281e+00 | -3.768e+00 | 4.869e-01 | 1.292e-01 |
E | -1.792e+00 | -2.232e+00 | 4.397e-01 | 1.970e-01 |
E | -7.435e-01 | -6.558e-01 | -8.775e-02 | 1.338e-01 |
E | -3.577e-01 | -4.232e-01 | 6.545e-02 | 1.547e-01 |
E | -1.968e+00 | -2.191e+00 | 2.235e-01 | 1.020e-01 |
E | -3.799e-01 | -2.172e-01 | -1.626e-01 | 7.487e-01 |
E | 7.900e-02 | -1.761e-01 | 2.551e-01 | 1.449e+00 |
E | 8.555e-01 | 7.654e-01 | 9.011e-02 | 1.177e-01 |
E | -2.135e-01 | -3.305e-01 | 1.170e-01 | 3.541e-01 |
E | -1.905e+00 | -2.228e+00 | 3.232e-01 | 1.451e-01 |
E | -3.454e+00 | -2.705e+00 | -7.495e-01 | 2.771e-01 |
E | -2.037e+00 | -2.207e+00 | 1.692e-01 | 7.670e-02 |
E | -4.795e-01 | -3.563e-01 | -1.232e-01 | 3.458e-01 |
E | -2.691e+00 | -3.141e+00 | 4.506e-01 | 1.434e-01 |
E | -1.627e+00 | -2.306e+00 | 6.784e-01 | 2.942e-01 |
E
E assert False
E + where False = <function allclose at 0x7f4a28366270>(array([ 4.4454584 , 6.037902 , 4.6971283 , 5.186194 , 6.071957 ,\n -0.6976079 , -1.1924832 , -2.2186813 , ...900415, 0.8554982 , -0.21349816, -1.9051081 ,\n -3.4541266 , -2.037314 , -0.47953042, -2.690723 , -1.6272427 ]), array([ 6.3260064 , 5.916272 , 5.5523157 , 7.2986684 , 5.7792473 ,\n -0.5613674 , -1.5821338 , -2.3119287 , ...612725, 0.7653889 , -0.33054087, -2.228345 ,\n -2.7046354 , -2.2065454 , -0.35630605, -3.1413238 , -2.3056736 ]), rtol=0.01, equal_nan=True)
E + where <function allclose at 0x7f4a28366270> = np.allclose
tests/test_method_score_cell_main.py:76: AssertionError
------------------------------------------------------------------------------------------------------------ Captured stdout call ------------------------------------------------------------------------------------------------------------
******************************************************************************
* Single-cell disease relevance score (scDRS)
* Version 1.0.3
* Martin Jinye Zhang and Kangcheng Hou
* HSPH / Broad Institute / UCLA
* MIT License
******************************************************************************
Call: scdrs compute-score \
--h5ad-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.h5ad \
--h5ad-species mmusculus \
--cov-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.cov \
--gs-file /home/jul307/software/scDRS/scdrs/data/toydata_human.gs \
--gs-species hsapiens \
--ctrl-match-opt mean_var \
--weight-opt vs \
--adj-prop None \
--flag-filter-data False \
--flag-raw-count False \
--n-ctrl 20 \
--min-genes 250 \
--min-cells 50 \
--flag-return-ctrl-raw-score False \
--flag-return-ctrl-norm-score False \
--out-folder /scratch/tmpggtt845u
Loading data:
--h5ad-file loaded: n_cell=30, n_gene=2500 (sys_time=0.1s)
First 3 cells: ['N1.MAA000586.3_8_M.1.1-1-1', 'F10.D041911.3_8_M.1.1-1-1', 'A17_B002755_B007347_S17.mm10-plus-7-0']
First 5 genes: ['Pip4k2a', 'Chd7', 'Atp6v0c', 'Exoc3', 'Pex5']
--cov-file loaded: covariates=['covariate'] (sys_time=0.1s)
n_cell=30 (30 in .h5ad)
First 3 cells: ['N1.MAA000586.3_8_M.1.1-1-1', 'F10.D041911.3_8_M.1.1-1-1', 'A17_B002755_B007347_S17.mm10-plus-7-0']
First 5 values for 'covariate': [10, 10, 10, 10, 10]
--gs-file loaded: n_trait=1 (sys_time=0.1s)
Print info for first 3 traits:
First 3 elements for 'toydata_gs_human': ['Mrps33', 'Cyp4f13', 'Kazald1'], [1.0, 1.0, 1.0]
Preprocessing:
Too few genes for 20*20 bins, setting n_mean_bin=n_var_bin=15
Computing scDRS score:
Trait=toydata_gs_human, n_gene=250: 6/30 FDR<0.1 cells, 6/30 FDR<0.2 cells (sys_time=0.4s)
******************************************************************************
* Single-cell disease relevance score (scDRS)
* Version 1.0.3
* Martin Jinye Zhang and Kangcheng Hou
* HSPH / Broad Institute / UCLA
* MIT License
******************************************************************************
Call: scdrs compute-score \
--h5ad-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.h5ad \
--h5ad-species mouse \
--cov-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.cov \
--gs-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.gs \
--gs-species mouse \
--ctrl-match-opt mean_var \
--weight-opt vs \
--adj-prop None \
--flag-filter-data False \
--flag-raw-count False \
--n-ctrl 20 \
--min-genes 250 \
--min-cells 50 \
--flag-return-ctrl-raw-score False \
--flag-return-ctrl-norm-score False \
--out-folder /scratch/tmpggtt845u
Loading data:
--h5ad-file loaded: n_cell=30, n_gene=2500 (sys_time=0.0s)
First 3 cells: ['N1.MAA000586.3_8_M.1.1-1-1', 'F10.D041911.3_8_M.1.1-1-1', 'A17_B002755_B007347_S17.mm10-plus-7-0']
First 5 genes: ['Pip4k2a', 'Chd7', 'Atp6v0c', 'Exoc3', 'Pex5']
--cov-file loaded: covariates=['covariate'] (sys_time=0.0s)
n_cell=30 (30 in .h5ad)
First 3 cells: ['N1.MAA000586.3_8_M.1.1-1-1', 'F10.D041911.3_8_M.1.1-1-1', 'A17_B002755_B007347_S17.mm10-plus-7-0']
First 5 values for 'covariate': [10, 10, 10, 10, 10]
--gs-file loaded: n_trait=1 (sys_time=0.0s)
Print info for first 3 traits:
First 3 elements for 'toydata_gs_mouse': ['Mrps33', 'Cyp4f13', 'Kazald1'], [1.0, 1.0, 1.0]
Preprocessing:
Too few genes for 20*20 bins, setting n_mean_bin=n_var_bin=15
Computing scDRS score:
Trait=toydata_gs_mouse, n_gene=250: 6/30 FDR<0.1 cells, 6/30 FDR<0.2 cells (sys_time=0.3s)
------------------------------------------------------------------------------------------------------------ Captured stderr call ------------------------------------------------------------------------------------------------------------
Computing control scores: 100%|██████████| 20/20 [00:00<00:00, 272.68it/s]
Computing control scores: 100%|██████████| 20/20 [00:00<00:00, 286.57it/s]
========================================================================================================== short test summary info ===========================================================================================================
FAILED tests/test_CLI.py::test_score_cell_cli - AssertionError: Inconsistent values: norm_score
======================================================================================================== 1 failed, 2 passed in 37.78s ========================================================================================================
I've also tried scDRS v.1.0.3 with multiple versions of Python (3.8-3.12), and the test only passed with Python 3.8 for some reason:
python -m pytest tests/test_CLI.py -p no:warnings
============================================================================================================ test session starts =============================================================================================================
platform linux -- Python 3.8.19, pytest-8.1.1, pluggy-1.4.0
rootdir: /home/jul307/software/scDRS
configfile: pyproject.toml
plugins: anyio-3.7.1
collected 3 items
tests/test_CLI.py ... [100%]
============================================================================================================= 3 passed in 46.72s =============================================================================================================
Somewhat strangely, I couldn't replicate this error using either python 3.9 / 3.10.
For example in https://colab.google/ (3.10)
!python --version
!pip install git+https://github.com/martinjzhang/scDRS.git
import os
import pandas as pd
import scdrs
DATA_PATH = scdrs.__path__[0]
H5AD_FILE = os.path.join(DATA_PATH, "data/toydata_mouse.h5ad")
COV_FILE = os.path.join(DATA_PATH, "data/toydata_mouse.cov")
GS_FILE = os.path.join(DATA_PATH, "data/toydata_mouse.gs")
# Load .h5ad file, .cov file, and .gs file
adata = scdrs.util.load_h5ad(H5AD_FILE, flag_filter_data=False, flag_raw_count=False)
df_cov = pd.read_csv(COV_FILE, sep="\t", index_col=0)
df_gs = scdrs.util.load_gs(GS_FILE)
# Preproecssing .h5ad data compute scDRS score
scdrs.preprocess(adata, cov=df_cov)
gene_list = df_gs['toydata_gs_mouse'][0]
gene_weight = df_gs['toydata_gs_mouse'][1]
df_res = scdrs.score_cell(adata, gene_list, gene_weight=gene_weight, n_ctrl=20)
print(df_res.iloc[:4])
Strange indeed... Maybe something is wrong with my conda. But I can't think of any reason why only the norm_score
is affected and why this is Python version-dependent.
Thanks for the efforts in pinpointing the issue. I'm closing this for now unless someone else runs into this. But I'd recommend updating the installation instructions in the tutorial to v.1.0.3.
I replicated this issue (with the exact norm_score
values as @hoholee's) using conda + py39 on a local HPC. This might be a Python version issue. I will look into this matter further.
Fixed. The issue is due to a small discrepancy between different pandas versions. https://github.com/martinjzhang/scDRS/pull/85
Dear scDRS devs,
Hi, I tried following the tutorial (https://martinjzhang.github.io/scDRS/index.html) and running the quick test after installing
scDRS
in a conda env:But then I ran into this error:
When I checked the output against the expected results listed on the tutorial page, indeed only the values
norm_score
column are not matched:I wonder what would be the reason causing this, and whether I should worry about this before I run scDRS on my real dataset. Did you guys update the ways you compute the
norm_score
? Also, I see in the manual and also the Github page you have the scDRS v.1.0.3 updated but I can't find this branch in the repo, and the version printed out from python also said v.1.0.2 (with some syntax warnings):Could this be the reason causing the inconsistency?
Here are the packages installed in my conda env for your information:
And the full error log when I ran the quick test: