martinjzhang / scDRS

Single-cell disease relevance score (scDRS)
https://martinjzhang.github.io/scDRS/
MIT License
105 stars 13 forks source link

Issue with test run #9

Closed tkamath1 closed 2 years ago

tkamath1 commented 2 years ago

Hi! Thanks for providing this exciting new analysis package. I am having some trouble with the running the test case. When I run the command line test: python -m pytest tests/test_scdrs.py -p no:warnings

I receive the following error. Any help would be much appreciated. Thanks!

============================================================================================== FAILURES ===============================================================================================
___________________________________________________________________________________________ test_score_cell __________________________________________________________________________________________$

    def test_score_cell():                                                                                                                                                                             

        # Load toy data                                                                                                                                                                                
        DATA_PATH=scdrs.__path__[0]                                                                                                                                                                    
        H5AD_FILE=os.path.join(DATA_PATH,'data/toydata_mouse.h5ad')                                                                                                                                    
        COV_FILE=os.path.join(DATA_PATH,'data/toydata_mouse.cov')                                                                                                                                      
        GS_FILE=os.path.join(DATA_PATH,'data/toydata_mouse.gs')                                                                                                                                        
        assert os.path.exists(H5AD_FILE), "built-in data toydata_mouse.h5ad missing"                                                                                                                   
        assert os.path.exists(COV_FILE), "built-in data toydata_mouse.cov missing"                                                                                                                     
        assert os.path.exists(GS_FILE), "built-in data toydata_mouse.gs missing"                                                                                                                       

        # Load built-in data                                                                                                                                                                           
        adata = read_h5ad(H5AD_FILE)                                                                                                                                                                   

        df_cov = pd.read_csv(COV_FILE, sep='\t', index_col=0)                                                                                                                                          
        cov_list = list(df_cov.columns)                                                                                                                                                                
        adata.obs.drop([x for x in cov_list if x in adata.obs.columns], axis=1, inplace=True)                                                                                                          
        adata.obs = adata.obs.join(df_cov)                                                                                                                                                             
        adata.obs.fillna(adata.obs[cov_list].mean(), inplace=True)
        adata.var['mean'] = adata.X.mean(axis=0).T
        if sp.sparse.issparse(adata.X):
            adata.X = adata.X.toarray()
        adata.X -= adata.var['mean'].values
        adata.X = scdrs.method.reg_out(adata.X, adata.obs[cov_list].values)
>       adata.X += adata.var['mean']

tests/test_scdrs.py:37:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
../../anaconda3/lib/python3.7/site-packages/pandas/core/ops.py:1071: in wrapper
    index=left.index, name=res_name, dtype=None)
../../anaconda3/lib/python3.7/site-packages/pandas/core/ops.py:980: in _construct_result
    out = left._constructor(result, index=index, dtype=dtype)

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <[AttributeError("'Series' object has no attribute '_name'") raised in repr()] Series object at 0x7fa40a633890>
data = array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., na...nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]])
index = Index([b'Pip4k2a',    b'Chd7', b'Atp6v0c',   b'Exoc3',    b'Pex5',     b'Wrn',
        b'Zfp236',   b'Asna1',    b'Pdh...', b'Pikfyve',
         b'Tram1',    b'Ei24',    b'Smc2',   b'Cops4'],
      dtype='object', name='index', length=2500)
dtype = None, name = None, copy = False, fastpath = False

    def __init__(self, data=None, index=None, dtype=None, name=None,                                                                                                                                   
                 copy=False, fastpath=False):                                                                                                                                                          

        # we are called internally, so short-circuit                                                                                                                                                   
        if fastpath:                                                                                                                                                                                   

            # data is an ndarray, index is defined                                                                                                                                                     
            if not isinstance(data, SingleBlockManager):                                                                                                                                               
                data = SingleBlockManager(data, index, fastpath=True)                                                                                                                                  
            if copy:                                                                                                                                                                                   
                data = data.copy()                                                                                                                                                                     
            if index is None:                                                                                                                                                                          
                index = data.index                                                                                                                                                                     

        else:                                                                                                                                                                                          

            if index is not None:                                                                                                                                                                      
                index = _ensure_index(index)                                                                                                                                                           

            if data is None:                                                                                                                                                                           
                data = {}                                                                                                                                                                              
            if dtype is not None:                                                                                                                                                                      
                dtype = self._validate_dtype(dtype)                                                                                                                                                    

            if isinstance(data, MultiIndex):                                                                                                                                                           
                raise NotImplementedError("initializing a Series from a "                                                                                                                              
                                          "MultiIndex is not supported")                                                                                                                               
            elif isinstance(data, Index):                                                                                                                                                              
                if name is None:                                                                                                                                                                       
                    name = data.name                                                                                                                                                                   

                if dtype is not None: 
                   # astype copies
                    data = data.astype(dtype)
                else:
                    # need to copy to avoid aliasing issues
                    data = data._values.copy()
                copy = False

            elif isinstance(data, np.ndarray):
                pass
            elif isinstance(data, Series):
                if name is None:
                    name = data.name
                if index is None:
                    index = data.index
                else:
                    data = data.reindex(index, copy=copy)
                data = data._data
            elif isinstance(data, dict):
                data, index = self._init_dict(data, index, dtype)
                dtype = None
                copy = False
            elif isinstance(data, SingleBlockManager):
                if index is None:
                    index = data.index
                elif not data.index.equals(index) or copy:
                    # GH#19275 SingleBlockManager input should only be called
                    # internally
                    raise AssertionError('Cannot pass both SingleBlockManager '
                                         '`data` argument and a different '
                                         '`index` argument.  `copy` must '
                                         'be False.')

            elif is_extension_array_dtype(data) and dtype is not None:
                if not data.dtype.is_dtype(dtype):
                    raise ValueError("Cannot specify a dtype '{}' with an "
                                   "extension array of a different "
                                     "dtype ('{}').".format(dtype,
                                                            data.dtype))

            elif (isinstance(data, types.GeneratorType) or
                  (compat.PY3 and isinstance(data, map))):
                data = list(data)
            elif isinstance(data, (set, frozenset)):
                raise TypeError("{0!r} type is unordered"
                                "".format(data.__class__.__name__))
            else:

                # handle sparse passed here (and force conversion)
                if isinstance(data, ABCSparseArray):
                    data = data.to_dense()

            if index is None:
                if not is_list_like(data):
                    data = [data]
                index = com._default_index(len(data))
            elif is_list_like(data):

                # a scalar numpy array is list-like but doesn't
                # have a proper length
                try:
                    if len(index) != len(data):
                        raise ValueError(
                            'Length of passed values is {val}, '
                            'index implies {ind}'
>                           .format(val=len(data), ind=len(index)))
E                           ValueError: Length of passed values is 30, index implies 2500

../../anaconda3/lib/python3.7/site-packages/pandas/core/series.py:262: ValueError
KangchengHou commented 2 years ago

Thanks for reporting this. Because we are in the process of updating the software pacakge. Could you provide your installation commands for us to track down the reason?

Also, could you also try the following commands?

git clone https://github.com/martinjzhang/scDRS.git
cd scDRS
# Current version under development; switch to submission version
# https://github.com/martinjzhang/scDRS/releases/tag/v0.1
git checkout -b initial_submission v0.1 
pip install -e .

# run tests
python -m pytest tests/test_scdrs.py -p no:warnings

(I realize there may be some issues in the commands order of between git checkout and pip install -e in the README.md)

Thanks!

tkamath1 commented 2 years ago

Thanks so much for the quick response!

I went ahead and uninstalled and re-installed scDRS per your suggestions above. Unfortunately now it looks like I'm receiving a segfault error. Perhaps this could be an issue on my end, as I only have 4.8G of free space available on my virtual machine. The only other change I made was an update to pandas.

python -m pytest tests/test_scdrs.py -p no:warnings
========================================================================================= test session starts =========================================================================================
platform linux -- Python 3.7.4, pytest-3.6.3, py-1.5.4, pluggy-0.6.0
rootdir: /home/tkamath/GIT/scDRS, inifile:
plugins: remotedata-0.3.2, openfiles-0.4.0, doctestplus-0.4.0, arraydiff-0.3
collected 1 item                                                                                                                                                                                      

tests/test_scdrs.py Segmentation fault
KangchengHou commented 2 years ago

we indeed did not see segfault issue before. Any chance you have another computing resource?

Also could you also try pip install -U anndata

KangchengHou commented 2 years ago

Sorry for the possible confusion with these different potential solutions. But here is another route that's worth trying by creating a virtual python environment to avoid potential dependency problem provided by @martinjzhang

# Set up a Virtual Environment
# See details in https://harvardmed.atlassian.net/wiki/spaces/O2/pages/1588662166/Personal+Python+Packages
# Go to the folder where you want to put the virtualenv file
virtualenv myenv_scdrs
source myenv_scdrs/bin/activate

# Installation
# Go to the place where you want to put scdrs code
git clone https://github.com/martinjzhang/scDRS.git
cd scDRS; pip install -e .
# Current version under development; switch to submission version
# https://github.com/martinjzhang/scDRS/releases/tag/v0.1
git checkout -b initial_submission v0.1 

# Quick test
python -m pytest tests/test_scdrs.py -p no:warnings

Let us know how these different route work out.

tkamath1 commented 2 years ago

It seemed like the virtualenv installation did the trick! Thanks so much for your help!