martinjzhang / scDRS

Single-cell disease relevance score (scDRS)
https://martinjzhang.github.io/scDRS/
MIT License
98 stars 11 forks source link

AssertionError when running quick test after installation #83

Closed hoholee closed 3 months ago

hoholee commented 3 months ago

Dear scDRS devs,

Hi, I tried following the tutorial (https://martinjzhang.github.io/scDRS/index.html) and running the quick test after installing scDRS in a conda env:

git clone https://github.com/martinjzhang/scDRS.git
cd scDRS
git checkout -b v102 v1.0.2
pip install -e .

python -m pytest tests/test_CLI.py -p no:warnings

But then I ran into this error:

FAILED tests/test_CLI.py::test_score_cell_cli - AssertionError: Inconsistent values: norm_score

When I checked the output against the expected results listed on the tutorial page, indeed only the values norm_score column are not matched:

## my output
>>> print(df_res.iloc[:10])
                                        raw_score  norm_score   mc_pval      pval  nlog10_pval     zscore
index
N1.MAA000586.3_8_M.1.1-1-1               4.741197    4.445458  0.047619  0.001664     2.778874   2.935716
F10.D041911.3_8_M.1.1-1-1                4.739066    6.037902  0.047619  0.001664     2.778874   2.935716
A17_B002755_B007347_S17.mm10-plus-7-0    4.636626    4.697128  0.047619  0.001664     2.778874   2.935716
C22_B003856_S298_L004.mus-2-0-1          4.680566    5.186194  0.047619  0.001664     2.778874   2.935716
G12.B002765.3_38_F.1.1-1-1               4.640043    6.071957  0.047619  0.001664     2.778874   2.935716
H5.B003278.3_38_F.1.1-1-1                4.445744   -0.697608  0.714286  0.745424     0.127596  -0.660160
O14.MAA000570.3_8_M.1.1-1-1              4.455234   -1.192483  0.857143  0.868552     0.061204  -1.119574
J21.B000634.3_56_F.1.1-1-1               4.443364   -2.218681  1.000000  0.990017     0.004358  -2.326973
E5.B002765.3_38_F.1.1-1-1                4.487077    1.216147  0.142857  0.118136     0.927616   1.184354
K20_B000268_B009896_S260.mm10-plus-4-0   4.535480   -4.155231  1.000000  1.000000    -0.000000 -10.000000
image

I wonder what would be the reason causing this, and whether I should worry about this before I run scDRS on my real dataset. Did you guys update the ways you compute the norm_score? Also, I see in the manual and also the Github page you have the scDRS v.1.0.3 updated but I can't find this branch in the repo, and the version printed out from python also said v.1.0.2 (with some syntax warnings):

>>> import numpy as np
>>> import pandas as pd
>>> import scanpy as sc
>>> import anndata as ad
>>> import scdrs
/home/jul307/software/scDRS/scdrs/method.py:401: SyntaxWarning: invalid escape sequence '\s'
  """Compute overdispersion score
/home/jul307/software/scDRS/scdrs/method.py:595: SyntaxWarning: invalid escape sequence '\S'
  """Compute p-value from empirical null
>>> scdrs.__version__
'1.0.2'

Could this be the reason causing the inconsistency?

Here are the packages installed in my conda env for your information:

# packages in environment at /cndd/junhao/anaconda3/envs/scDRS:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
anndata                   0.10.6                   pypi_0    pypi
array-api-compat          1.6                      pypi_0    pypi
bzip2                     1.0.8                hd590300_5    conda-forge
ca-certificates           2024.2.2             hbcca054_0    conda-forge
contourpy                 1.2.1                    pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
fire                      0.6.0                    pypi_0    pypi
fonttools                 4.50.0                   pypi_0    pypi
h5py                      3.10.0                   pypi_0    pypi
iniconfig                 2.0.0                    pypi_0    pypi
joblib                    1.3.2                    pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
legacy-api-wrap           1.4                      pypi_0    pypi
libexpat                  2.6.2                h59595ed_0    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.2.0               h807b86a_5    conda-forge
libgomp                   13.2.0               h807b86a_5    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libsqlite                 3.45.2               h2797004_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
llvmlite                  0.42.0                   pypi_0    pypi
matplotlib                3.8.4                    pypi_0    pypi
natsort                   8.4.0                    pypi_0    pypi
ncurses                   6.4.20240210         h59595ed_0    conda-forge
networkx                  3.2.1                    pypi_0    pypi
numba                     0.59.1                   pypi_0    pypi
numpy                     1.26.4                   pypi_0    pypi
openssl                   3.2.1                hd590300_1    conda-forge
packaging                 24.0                     pypi_0    pypi
pandas                    2.2.1                    pypi_0    pypi
patsy                     0.5.6                    pypi_0    pypi
pillow                    10.3.0                   pypi_0    pypi
pip                       24.0               pyhd8ed1ab_0    conda-forge
pluggy                    1.4.0                    pypi_0    pypi
pynndescent               0.5.12                   pypi_0    pypi
pyparsing                 3.1.2                    pypi_0    pypi
pytest                    8.1.1                    pypi_0    pypi
python                    3.12.2          hab00c5b_0_cpython    conda-forge
python-dateutil           2.9.0.post0              pypi_0    pypi
pytz                      2024.1                   pypi_0    pypi
readline                  8.2                  h8228510_1    conda-forge
scanpy                    1.10.0                   pypi_0    pypi
scdrs                     1.0.2                    pypi_0    pypi
scikit-learn              1.4.1.post1              pypi_0    pypi
scikit-misc               0.3.1                    pypi_0    pypi
scipy                     1.13.0                   pypi_0    pypi
seaborn                   0.13.2                   pypi_0    pypi
session-info              1.0.0                    pypi_0    pypi
setuptools                69.2.0             pyhd8ed1ab_0    conda-forge
six                       1.16.0                   pypi_0    pypi
statsmodels               0.14.1                   pypi_0    pypi
stdlib-list               0.10.0                   pypi_0    pypi
termcolor                 2.4.0                    pypi_0    pypi
threadpoolctl             3.4.0                    pypi_0    pypi
tk                        8.6.13          noxft_h4845f30_101    conda-forge
tqdm                      4.66.2                   pypi_0    pypi
tzdata                    2024.1                   pypi_0    pypi
umap-learn                0.5.6                    pypi_0    pypi
wheel                     0.43.0             pyhd8ed1ab_1    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge

And the full error log when I ran the quick test:

============================================================================================================ test session starts =============================================================================================================
platform linux -- Python 3.12.2, pytest-8.1.1, pluggy-1.4.0
rootdir: /home/jul307/software/scDRS
configfile: pyproject.toml
collected 3 items

tests/test_CLI.py F..                                                                                                                                                                                                                  [100%]

================================================================================================================== FAILURES ==================================================================================================================
____________________________________________________________________________________________________________ test_score_cell_cli _____________________________________________________________________________________________________________

    def test_score_cell_cli():
        """
        Test CLI `scdrs compute-score`
        """
        # Load toy data
        ROOT_DIR = scdrs.__path__[0]
        H5AD_FILE = os.path.join(ROOT_DIR, "data/toydata_mouse.h5ad")
        COV_FILE = os.path.join(ROOT_DIR, "data/toydata_mouse.cov")
        assert os.path.exists(H5AD_FILE), "built-in data toydata_mouse.h5ad missing"
        assert os.path.exists(COV_FILE), "built-in data toydata_mouse.cov missing"

        tmp_dir = tempfile.TemporaryDirectory()
        tmp_dir_path = tmp_dir.name
        dict_df_score = {}
        for gs_species in ["human", "mouse"]:
            gs_file = os.path.join(ROOT_DIR, f"data/toydata_{gs_species}.gs")
            # call compute_score.py
            cmds = [
                f"scdrs compute-score",
                f"--h5ad_file {H5AD_FILE}",
                "--h5ad_species mouse",
                f"--gs_file {gs_file}",
                f"--gs_species {gs_species}",
                f"--cov_file {COV_FILE}",
                "--ctrl_match_opt mean_var",
                "--n_ctrl 20",
                "--flag_filter_data False",
                "--weight_opt vs",
                "--flag_raw_count False",
                "--flag_return_ctrl_raw_score False",
                "--flag_return_ctrl_norm_score False",
                f"--out_folder {tmp_dir_path}",
            ]
            subprocess.check_call(" ".join(cmds), shell=True)
            dict_df_score[gs_species] = pd.read_csv(
                os.path.join(tmp_dir_path, f"toydata_gs_{gs_species}.score.gz"),
                sep="\t",
                index_col=0,
            )
        # consistency between human and mouse
        assert np.all(dict_df_score["mouse"].pval == dict_df_score["human"].pval)

        df_res = dict_df_score["mouse"]

        REF_COV_FILE = os.path.join(
            ROOT_DIR, "data/toydata_gs_mouse.ref_Ctrl20_CovConstCovariate.score.gz"
        )
        df_ref_res = pd.read_csv(REF_COV_FILE, sep="\t", index_col=0)
>       compare_score_file(df_res, df_ref_res)

tests/test_CLI.py:58:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

df_res =                                         raw_score  norm_score   mc_pval      pval  nlog10_pval     zscore
index       ...00 -10.000000
J10_B003899_S130.mus-7-0-1               4.460493   -1.627243  1.000000  0.956739     0.019207  -1.714034
df_res_ref =                                         raw_score  norm_score   mc_pval      pval  nlog10_pval     zscore
index       ...00 -10.000000
J10_B003899_S130.mus-7-0-1               4.460493   -2.305674  1.000000  0.991680     0.003628  -2.394591

    def compare_score_file(df_res, df_res_ref):
        """
        Compare df_res
        """

        col_list = ["raw_score", "norm_score", "mc_pval", "pval"]
        for col in col_list:
            v_ = df_res[col].values
            v_ref = df_res_ref[col].values
            err_msg = "Inconsistent values: {}\n".format(col)
            err_msg += "|{:^15}|{:^15}|{:^15}|{:^15}|\n".format(
                "OBS", "REF", "DIF", "REL_DIF"
            )
            for i in range(v_.shape[0]):
                err_msg += "|{:^15.3e}|{:^15.3e}|{:^15.3e}|{:^15.3e}|\n".format(
                    v_[i],
                    v_ref[i],
                    v_[i] - v_ref[i],
                    np.absolute((v_[i] - v_ref[i]) / v_ref[i]),
                )
>           assert np.allclose(v_, v_ref, rtol=1e-2, equal_nan=True), err_msg
E           AssertionError: Inconsistent values: norm_score
E             |      OBS      |      REF      |      DIF      |    REL_DIF    |
E             |   4.445e+00   |   6.326e+00   |  -1.881e+00   |   2.973e-01   |
E             |   6.038e+00   |   5.916e+00   |   1.216e-01   |   2.056e-02   |
E             |   4.697e+00   |   5.552e+00   |  -8.552e-01   |   1.540e-01   |
E             |   5.186e+00   |   7.299e+00   |  -2.112e+00   |   2.894e-01   |
E             |   6.072e+00   |   5.779e+00   |   2.927e-01   |   5.065e-02   |
E             |  -6.976e-01   |  -5.614e-01   |  -1.362e-01   |   2.427e-01   |
E             |  -1.192e+00   |  -1.582e+00   |   3.897e-01   |   2.463e-01   |
E             |  -2.219e+00   |  -2.312e+00   |   9.325e-02   |   4.033e-02   |
E             |   1.216e+00   |   1.157e+00   |   5.952e-02   |   5.146e-02   |
E             |  -4.155e+00   |  -3.166e+00   |  -9.896e-01   |   3.126e-01   |
E             |   2.262e+00   |   1.505e+00   |   7.576e-01   |   5.035e-01   |
E             |  -2.240e+00   |  -3.798e+00   |   1.558e+00   |   4.102e-01   |
E             |   7.692e-01   |   1.052e+00   |  -2.824e-01   |   2.686e-01   |
E             |   2.888e-01   |  -1.237e-01   |   4.126e-01   |   3.334e+00   |
E             |  -4.752e-01   |  -8.706e-01   |   3.954e-01   |   4.541e-01   |
E             |  -3.281e+00   |  -3.768e+00   |   4.869e-01   |   1.292e-01   |
E             |  -1.792e+00   |  -2.232e+00   |   4.397e-01   |   1.970e-01   |
E             |  -7.435e-01   |  -6.558e-01   |  -8.775e-02   |   1.338e-01   |
E             |  -3.577e-01   |  -4.232e-01   |   6.545e-02   |   1.547e-01   |
E             |  -1.968e+00   |  -2.191e+00   |   2.235e-01   |   1.020e-01   |
E             |  -3.799e-01   |  -2.172e-01   |  -1.626e-01   |   7.487e-01   |
E             |   7.900e-02   |  -1.761e-01   |   2.551e-01   |   1.449e+00   |
E             |   8.555e-01   |   7.654e-01   |   9.011e-02   |   1.177e-01   |
E             |  -2.135e-01   |  -3.305e-01   |   1.170e-01   |   3.541e-01   |
E             |  -1.905e+00   |  -2.228e+00   |   3.232e-01   |   1.451e-01   |
E             |  -3.454e+00   |  -2.705e+00   |  -7.495e-01   |   2.771e-01   |
E             |  -2.037e+00   |  -2.207e+00   |   1.692e-01   |   7.670e-02   |
E             |  -4.795e-01   |  -3.563e-01   |  -1.232e-01   |   3.458e-01   |
E             |  -2.691e+00   |  -3.141e+00   |   4.506e-01   |   1.434e-01   |
E             |  -1.627e+00   |  -2.306e+00   |   6.784e-01   |   2.942e-01   |
E
E           assert False
E            +  where False = <function allclose at 0x7f7f9b24a5b0>(array([ 4.4454584 ,  6.037902  ,  4.6971283 ,  5.186194  ,  6.071957  ,\n       -0.6976079 , -1.1924832 , -2.2186813 , ...900415,  0.8554982 , -0.21349816, -1.9051081 ,\n       -3.4541266 , -2.037314  , -0.47953042, -2.690723  , -1.6272427 ]), array([ 6.3260064 ,  5.916272  ,  5.5523157 ,  7.2986684 ,  5.7792473 ,\n       -0.5613674 , -1.5821338 , -2.3119287 , ...612725,  0.7653889 , -0.33054087, -2.228345  ,\n       -2.7046354 , -2.2065454 , -0.35630605, -3.1413238 , -2.3056736 ]), rtol=0.01, equal_nan=True)
E            +    where <function allclose at 0x7f7f9b24a5b0> = np.allclose

tests/test_method_score_cell_main.py:76: AssertionError
------------------------------------------------------------------------------------------------------------ Captured stdout call ------------------------------------------------------------------------------------------------------------
******************************************************************************
* Single-cell disease relevance score (scDRS)
* Version 1.0.2
* Martin Jinye Zhang and Kangcheng Hou
* HSPH / Broad Institute / UCLA
* MIT License
******************************************************************************
Call: scdrs compute-score \
--h5ad-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.h5ad \
--h5ad-species mmusculus \
--cov-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.cov \
--gs-file /home/jul307/software/scDRS/scdrs/data/toydata_human.gs \
--gs-species hsapiens \
--ctrl-match-opt mean_var \
--weight-opt vs \
--adj-prop None \
--flag-filter-data False \
--flag-raw-count False \
--n-ctrl 20 \
--flag-return-ctrl-raw-score False \
--flag-return-ctrl-norm-score False \
--out-folder /scratch/tmpyc6576ha

Loading data:
--h5ad-file loaded: n_cell=30, n_gene=2500 (sys_time=0.0s)
First 3 cells: ['N1.MAA000586.3_8_M.1.1-1-1', 'F10.D041911.3_8_M.1.1-1-1', 'A17_B002755_B007347_S17.mm10-plus-7-0']
First 5 genes: ['Pip4k2a', 'Chd7', 'Atp6v0c', 'Exoc3', 'Pex5']
--cov-file loaded: covariates=['covariate'] (sys_time=0.0s)
First 5 values for 'covariate': [10, 10, 10, 10, 10]
--gs-file loaded: n_trait=1 (sys_time=0.0s)
Print info for first 3 traits:
First 3 elements for 'toydata_gs_human': ['Mrps33', 'Cyp4f13', 'Kazald1'], [1.0, 1.0, 1.0]

Preprocessing:
Too few genes for 20*20 bins, setting n_mean_bin=n_var_bin=15

Computing scDRS score:
Trait=toydata_gs_human, n_gene=250: 6/30 FDR<0.1 cells, 6/30 FDR<0.2 cells (sys_time=0.3s)
******************************************************************************
* Single-cell disease relevance score (scDRS)
* Version 1.0.2
* Martin Jinye Zhang and Kangcheng Hou
* HSPH / Broad Institute / UCLA
* MIT License
******************************************************************************
Call: scdrs compute-score \
--h5ad-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.h5ad \
--h5ad-species mouse \
--cov-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.cov \
--gs-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.gs \
--gs-species mouse \
--ctrl-match-opt mean_var \
--weight-opt vs \
--adj-prop None \
--flag-filter-data False \
--flag-raw-count False \
--n-ctrl 20 \
--flag-return-ctrl-raw-score False \
--flag-return-ctrl-norm-score False \
--out-folder /scratch/tmpyc6576ha

Loading data:
--h5ad-file loaded: n_cell=30, n_gene=2500 (sys_time=0.0s)
First 3 cells: ['N1.MAA000586.3_8_M.1.1-1-1', 'F10.D041911.3_8_M.1.1-1-1', 'A17_B002755_B007347_S17.mm10-plus-7-0']
First 5 genes: ['Pip4k2a', 'Chd7', 'Atp6v0c', 'Exoc3', 'Pex5']
--cov-file loaded: covariates=['covariate'] (sys_time=0.0s)
First 5 values for 'covariate': [10, 10, 10, 10, 10]
--gs-file loaded: n_trait=1 (sys_time=0.0s)
Print info for first 3 traits:
First 3 elements for 'toydata_gs_mouse': ['Mrps33', 'Cyp4f13', 'Kazald1'], [1.0, 1.0, 1.0]

Preprocessing:
Too few genes for 20*20 bins, setting n_mean_bin=n_var_bin=15

Computing scDRS score:
Trait=toydata_gs_mouse, n_gene=250: 6/30 FDR<0.1 cells, 6/30 FDR<0.2 cells (sys_time=0.3s)
------------------------------------------------------------------------------------------------------------ Captured stderr call ------------------------------------------------------------------------------------------------------------
Computing control scores: 100%|██████████| 20/20 [00:00<00:00, 284.12it/s]
Computing control scores: 100%|██████████| 20/20 [00:00<00:00, 288.89it/s]
========================================================================================================== short test summary info ===========================================================================================================
FAILED tests/test_CLI.py::test_score_cell_cli - AssertionError: Inconsistent values: norm_score
======================================================================================================== 1 failed, 2 passed in 40.71s ========================================================================================================
martinjzhang commented 3 months ago

Hi, v1.0.3 is in the main branch. We may have updated the test data. Can you install from the main branch and run the tests again?

hoholee commented 3 months ago

Same error with v1.0.3:

$ python -m pytest tests/test_CLI.py -p no:warnings
============================================================================================================ test session starts =============================================================================================================
platform linux -- Python 3.12.2, pytest-8.1.1, pluggy-1.4.0
rootdir: /home/jul307/software/scDRS
configfile: pyproject.toml
collected 3 items

tests/test_CLI.py F..                                                                                                                                                                                                                  [100%]

================================================================================================================== FAILURES ==================================================================================================================
____________________________________________________________________________________________________________ test_score_cell_cli _____________________________________________________________________________________________________________

    def test_score_cell_cli():
        """
        Test CLI `scdrs compute-score`
        """
        # Load toy data
        ROOT_DIR = scdrs.__path__[0]
        H5AD_FILE = os.path.join(ROOT_DIR, "data/toydata_mouse.h5ad")
        COV_FILE = os.path.join(ROOT_DIR, "data/toydata_mouse.cov")
        assert os.path.exists(H5AD_FILE), "built-in data toydata_mouse.h5ad missing"
        assert os.path.exists(COV_FILE), "built-in data toydata_mouse.cov missing"

        tmp_dir = tempfile.TemporaryDirectory()
        tmp_dir_path = tmp_dir.name
        dict_df_score = {}
        for gs_species in ["human", "mouse"]:
            gs_file = os.path.join(ROOT_DIR, f"data/toydata_{gs_species}.gs")
            # call compute_score.py
            cmds = [
                f"scdrs compute-score",
                f"--h5ad_file {H5AD_FILE}",
                "--h5ad_species mouse",
                f"--gs_file {gs_file}",
                f"--gs_species {gs_species}",
                f"--cov_file {COV_FILE}",
                "--ctrl_match_opt mean_var",
                "--n_ctrl 20",
                "--flag_filter_data False",
                "--weight_opt vs",
                "--flag_raw_count False",
                "--flag_return_ctrl_raw_score False",
                "--flag_return_ctrl_norm_score False",
                f"--out_folder {tmp_dir_path}",
            ]
            subprocess.check_call(" ".join(cmds), shell=True)
            dict_df_score[gs_species] = pd.read_csv(
                os.path.join(tmp_dir_path, f"toydata_gs_{gs_species}.score.gz"),
                sep="\t",
                index_col=0,
            )
        # consistency between human and mouse
        assert np.all(dict_df_score["mouse"].pval == dict_df_score["human"].pval)

        df_res = dict_df_score["mouse"]

        REF_COV_FILE = os.path.join(
            ROOT_DIR, "data/toydata_gs_mouse.ref_Ctrl20_CovConstCovariate.score.gz"
        )
        df_ref_res = pd.read_csv(REF_COV_FILE, sep="\t", index_col=0)
>       compare_score_file(df_res, df_ref_res)

tests/test_CLI.py:58:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

df_res =                                         raw_score  norm_score   mc_pval      pval  nlog10_pval     zscore
index       ...00 -10.000000
J10_B003899_S130.mus-7-0-1               4.460493   -1.627243  1.000000  0.956739     0.019207  -1.714034
df_res_ref =                                         raw_score  norm_score   mc_pval      pval  nlog10_pval     zscore
index       ...00 -10.000000
J10_B003899_S130.mus-7-0-1               4.460493   -2.305674  1.000000  0.991680     0.003628  -2.394591

    def compare_score_file(df_res, df_res_ref):
        """
        Compare df_res
        """

        col_list = ["raw_score", "norm_score", "mc_pval", "pval"]
        for col in col_list:
            v_ = df_res[col].values
            v_ref = df_res_ref[col].values
            err_msg = "Inconsistent values: {}\n".format(col)
            err_msg += "|{:^15}|{:^15}|{:^15}|{:^15}|\n".format(
                "OBS", "REF", "DIF", "REL_DIF"
            )
            for i in range(v_.shape[0]):
                err_msg += "|{:^15.3e}|{:^15.3e}|{:^15.3e}|{:^15.3e}|\n".format(
                    v_[i],
                    v_ref[i],
                    v_[i] - v_ref[i],
                    np.absolute((v_[i] - v_ref[i]) / v_ref[i]),
                )
>           assert np.allclose(v_, v_ref, rtol=1e-2, equal_nan=True), err_msg
E           AssertionError: Inconsistent values: norm_score
E             |      OBS      |      REF      |      DIF      |    REL_DIF    |
E             |   4.445e+00   |   6.326e+00   |  -1.881e+00   |   2.973e-01   |
E             |   6.038e+00   |   5.916e+00   |   1.216e-01   |   2.056e-02   |
E             |   4.697e+00   |   5.552e+00   |  -8.552e-01   |   1.540e-01   |
E             |   5.186e+00   |   7.299e+00   |  -2.112e+00   |   2.894e-01   |
E             |   6.072e+00   |   5.779e+00   |   2.927e-01   |   5.065e-02   |
E             |  -6.976e-01   |  -5.614e-01   |  -1.362e-01   |   2.427e-01   |
E             |  -1.192e+00   |  -1.582e+00   |   3.897e-01   |   2.463e-01   |
E             |  -2.219e+00   |  -2.312e+00   |   9.325e-02   |   4.033e-02   |
E             |   1.216e+00   |   1.157e+00   |   5.952e-02   |   5.146e-02   |
E             |  -4.155e+00   |  -3.166e+00   |  -9.896e-01   |   3.126e-01   |
E             |   2.262e+00   |   1.505e+00   |   7.576e-01   |   5.035e-01   |
E             |  -2.240e+00   |  -3.798e+00   |   1.558e+00   |   4.102e-01   |
E             |   7.692e-01   |   1.052e+00   |  -2.824e-01   |   2.686e-01   |
E             |   2.888e-01   |  -1.237e-01   |   4.126e-01   |   3.334e+00   |
E             |  -4.752e-01   |  -8.706e-01   |   3.954e-01   |   4.541e-01   |
E             |  -3.281e+00   |  -3.768e+00   |   4.869e-01   |   1.292e-01   |
E             |  -1.792e+00   |  -2.232e+00   |   4.397e-01   |   1.970e-01   |
E             |  -7.435e-01   |  -6.558e-01   |  -8.775e-02   |   1.338e-01   |
E             |  -3.577e-01   |  -4.232e-01   |   6.545e-02   |   1.547e-01   |
E             |  -1.968e+00   |  -2.191e+00   |   2.235e-01   |   1.020e-01   |
E             |  -3.799e-01   |  -2.172e-01   |  -1.626e-01   |   7.487e-01   |
E             |   7.900e-02   |  -1.761e-01   |   2.551e-01   |   1.449e+00   |
E             |   8.555e-01   |   7.654e-01   |   9.011e-02   |   1.177e-01   |
E             |  -2.135e-01   |  -3.305e-01   |   1.170e-01   |   3.541e-01   |
E             |  -1.905e+00   |  -2.228e+00   |   3.232e-01   |   1.451e-01   |
E             |  -3.454e+00   |  -2.705e+00   |  -7.495e-01   |   2.771e-01   |
E             |  -2.037e+00   |  -2.207e+00   |   1.692e-01   |   7.670e-02   |
E             |  -4.795e-01   |  -3.563e-01   |  -1.232e-01   |   3.458e-01   |
E             |  -2.691e+00   |  -3.141e+00   |   4.506e-01   |   1.434e-01   |
E             |  -1.627e+00   |  -2.306e+00   |   6.784e-01   |   2.942e-01   |
E
E           assert False
E            +  where False = <function allclose at 0x7f4a28366270>(array([ 4.4454584 ,  6.037902  ,  4.6971283 ,  5.186194  ,  6.071957  ,\n       -0.6976079 , -1.1924832 , -2.2186813 , ...900415,  0.8554982 , -0.21349816, -1.9051081 ,\n       -3.4541266 , -2.037314  , -0.47953042, -2.690723  , -1.6272427 ]), array([ 6.3260064 ,  5.916272  ,  5.5523157 ,  7.2986684 ,  5.7792473 ,\n       -0.5613674 , -1.5821338 , -2.3119287 , ...612725,  0.7653889 , -0.33054087, -2.228345  ,\n       -2.7046354 , -2.2065454 , -0.35630605, -3.1413238 , -2.3056736 ]), rtol=0.01, equal_nan=True)
E            +    where <function allclose at 0x7f4a28366270> = np.allclose

tests/test_method_score_cell_main.py:76: AssertionError
------------------------------------------------------------------------------------------------------------ Captured stdout call ------------------------------------------------------------------------------------------------------------
******************************************************************************
* Single-cell disease relevance score (scDRS)
* Version 1.0.3
* Martin Jinye Zhang and Kangcheng Hou
* HSPH / Broad Institute / UCLA
* MIT License
******************************************************************************
Call: scdrs compute-score \
--h5ad-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.h5ad \
--h5ad-species mmusculus \
--cov-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.cov \
--gs-file /home/jul307/software/scDRS/scdrs/data/toydata_human.gs \
--gs-species hsapiens \
--ctrl-match-opt mean_var \
--weight-opt vs \
--adj-prop None \
--flag-filter-data False \
--flag-raw-count False \
--n-ctrl 20 \
--min-genes 250 \
--min-cells 50 \
--flag-return-ctrl-raw-score False \
--flag-return-ctrl-norm-score False \
--out-folder /scratch/tmpggtt845u

Loading data:
--h5ad-file loaded: n_cell=30, n_gene=2500 (sys_time=0.1s)
First 3 cells: ['N1.MAA000586.3_8_M.1.1-1-1', 'F10.D041911.3_8_M.1.1-1-1', 'A17_B002755_B007347_S17.mm10-plus-7-0']
First 5 genes: ['Pip4k2a', 'Chd7', 'Atp6v0c', 'Exoc3', 'Pex5']
--cov-file loaded: covariates=['covariate'] (sys_time=0.1s)
n_cell=30 (30 in .h5ad)
First 3 cells: ['N1.MAA000586.3_8_M.1.1-1-1', 'F10.D041911.3_8_M.1.1-1-1', 'A17_B002755_B007347_S17.mm10-plus-7-0']
First 5 values for 'covariate': [10, 10, 10, 10, 10]
--gs-file loaded: n_trait=1 (sys_time=0.1s)
Print info for first 3 traits:
First 3 elements for 'toydata_gs_human': ['Mrps33', 'Cyp4f13', 'Kazald1'], [1.0, 1.0, 1.0]

Preprocessing:
Too few genes for 20*20 bins, setting n_mean_bin=n_var_bin=15

Computing scDRS score:
Trait=toydata_gs_human, n_gene=250: 6/30 FDR<0.1 cells, 6/30 FDR<0.2 cells (sys_time=0.4s)
******************************************************************************
* Single-cell disease relevance score (scDRS)
* Version 1.0.3
* Martin Jinye Zhang and Kangcheng Hou
* HSPH / Broad Institute / UCLA
* MIT License
******************************************************************************
Call: scdrs compute-score \
--h5ad-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.h5ad \
--h5ad-species mouse \
--cov-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.cov \
--gs-file /home/jul307/software/scDRS/scdrs/data/toydata_mouse.gs \
--gs-species mouse \
--ctrl-match-opt mean_var \
--weight-opt vs \
--adj-prop None \
--flag-filter-data False \
--flag-raw-count False \
--n-ctrl 20 \
--min-genes 250 \
--min-cells 50 \
--flag-return-ctrl-raw-score False \
--flag-return-ctrl-norm-score False \
--out-folder /scratch/tmpggtt845u

Loading data:
--h5ad-file loaded: n_cell=30, n_gene=2500 (sys_time=0.0s)
First 3 cells: ['N1.MAA000586.3_8_M.1.1-1-1', 'F10.D041911.3_8_M.1.1-1-1', 'A17_B002755_B007347_S17.mm10-plus-7-0']
First 5 genes: ['Pip4k2a', 'Chd7', 'Atp6v0c', 'Exoc3', 'Pex5']
--cov-file loaded: covariates=['covariate'] (sys_time=0.0s)
n_cell=30 (30 in .h5ad)
First 3 cells: ['N1.MAA000586.3_8_M.1.1-1-1', 'F10.D041911.3_8_M.1.1-1-1', 'A17_B002755_B007347_S17.mm10-plus-7-0']
First 5 values for 'covariate': [10, 10, 10, 10, 10]
--gs-file loaded: n_trait=1 (sys_time=0.0s)
Print info for first 3 traits:
First 3 elements for 'toydata_gs_mouse': ['Mrps33', 'Cyp4f13', 'Kazald1'], [1.0, 1.0, 1.0]

Preprocessing:
Too few genes for 20*20 bins, setting n_mean_bin=n_var_bin=15

Computing scDRS score:
Trait=toydata_gs_mouse, n_gene=250: 6/30 FDR<0.1 cells, 6/30 FDR<0.2 cells (sys_time=0.3s)
------------------------------------------------------------------------------------------------------------ Captured stderr call ------------------------------------------------------------------------------------------------------------
Computing control scores: 100%|██████████| 20/20 [00:00<00:00, 272.68it/s]
Computing control scores: 100%|██████████| 20/20 [00:00<00:00, 286.57it/s]
========================================================================================================== short test summary info ===========================================================================================================
FAILED tests/test_CLI.py::test_score_cell_cli - AssertionError: Inconsistent values: norm_score
======================================================================================================== 1 failed, 2 passed in 37.78s ========================================================================================================
hoholee commented 3 months ago

I've also tried scDRS v.1.0.3 with multiple versions of Python (3.8-3.12), and the test only passed with Python 3.8 for some reason:

python -m pytest tests/test_CLI.py -p no:warnings
============================================================================================================ test session starts =============================================================================================================
platform linux -- Python 3.8.19, pytest-8.1.1, pluggy-1.4.0
rootdir: /home/jul307/software/scDRS
configfile: pyproject.toml
plugins: anyio-3.7.1
collected 3 items

tests/test_CLI.py ...                                                                                                                                                                                                                  [100%]

============================================================================================================= 3 passed in 46.72s =============================================================================================================
KangchengHou commented 3 months ago

Somewhat strangely, I couldn't replicate this error using either python 3.9 / 3.10.

For example in https://colab.google/ (3.10)

!python --version
!pip install git+https://github.com/martinjzhang/scDRS.git

import os
import pandas as pd
import scdrs

DATA_PATH = scdrs.__path__[0]
H5AD_FILE = os.path.join(DATA_PATH, "data/toydata_mouse.h5ad")
COV_FILE = os.path.join(DATA_PATH, "data/toydata_mouse.cov")
GS_FILE = os.path.join(DATA_PATH, "data/toydata_mouse.gs")

# Load .h5ad file, .cov file, and .gs file
adata = scdrs.util.load_h5ad(H5AD_FILE, flag_filter_data=False, flag_raw_count=False)
df_cov = pd.read_csv(COV_FILE, sep="\t", index_col=0)
df_gs = scdrs.util.load_gs(GS_FILE)

# Preproecssing .h5ad data compute scDRS score
scdrs.preprocess(adata, cov=df_cov)
gene_list = df_gs['toydata_gs_mouse'][0]
gene_weight = df_gs['toydata_gs_mouse'][1]
df_res = scdrs.score_cell(adata, gene_list, gene_weight=gene_weight, n_ctrl=20)

print(df_res.iloc[:4])
hoholee commented 3 months ago

Strange indeed... Maybe something is wrong with my conda. But I can't think of any reason why only the norm_score is affected and why this is Python version-dependent.

Thanks for the efforts in pinpointing the issue. I'm closing this for now unless someone else runs into this. But I'd recommend updating the installation instructions in the tutorial to v.1.0.3.

martinjzhang commented 3 months ago

I replicated this issue (with the exact norm_score values as @hoholee's) using conda + py39 on a local HPC. This might be a Python version issue. I will look into this matter further.

martinjzhang commented 3 months ago

Fixed. The issue is due to a small discrepancy between different pandas versions. https://github.com/martinjzhang/scDRS/pull/85