jlakkis / CarDEC

Other
19 stars 4 forks source link

IndexError: index 0 is out of bounds for axis 0 with size 0 #4

Open bapoorva opened 4 years ago

bapoorva commented 4 years ago

Hi,

I processed my single cell data using scanpy's preprocessing script. I have two samples - control and mutant that i merged before processing it. When I ran it through the cardec pipeline, i'm getting this error. Can you please help me out with this ?

adata = read_macaque("~/Desktop/Cardec/wt_i73t.h5ad")
CarDEC = CarDEC_API(adata, weights_dir = "weights_dir/CarDEC_LVG Weights", batch_key = "sample", n_high_var = 2000, LVG = True)

IndexError                                Traceback (most recent call last)
<ipython-input-7-17ec3a532115> in <module>
     10 """
     11 
---> 12 CarDEC = CarDEC_API(adata, weights_dir = "weights_dir/CarDEC_LVG Weights", batch_key = "sample", n_high_var = 2000, LVG = True)
     13 

~/opt/miniconda3/lib/python3.7/site-packages/CarDEC/CarDEC_API.py in __init__(self, adata, preprocess, weights_dir, batch_key, n_high_var, LVG, normalize_samples, log_normalize, normalize_features)
     34 
     35         if preprocess:
---> 36             self.dataset = normalize_scanpy(adata, *self.norm_args)
     37         else:
     38             assert 'Variance Type' in adata.var.keys()

~/opt/miniconda3/lib/python3.7/site-packages/CarDEC/CarDEC_utils.py in normalize_scanpy(adata, batch_key, n_high_var, LVG, normalize_samples, log_normalize, normalize_features)
     43         adata = None
     44         adata = AnnData(out['X'])
---> 45         adata.obs = obs_
     46         adata.var = var_
     47 

~/opt/miniconda3/lib/python3.7/site-packages/anndata/_core/anndata.py in obs(self, value)
    832     @obs.setter
    833     def obs(self, value: pd.DataFrame):
--> 834         self._set_dim_df(value, "obs")
    835 
    836     @obs.deleter

~/opt/miniconda3/lib/python3.7/site-packages/anndata/_core/anndata.py in _set_dim_df(self, value, attr)
    781         if not isinstance(value, pd.DataFrame):
    782             raise ValueError(f"Can only assign pd.DataFrame to {attr}.")
--> 783         value_idx = self._prep_dim_index(value.index, attr)
    784         if self.is_view:
    785             self._init_as_actual(self.copy())

~/opt/miniconda3/lib/python3.7/site-packages/anndata/_core/anndata.py in _prep_dim_index(self, value, attr)
    808                 value.name = None
    809         if not isinstance(value, pd.RangeIndex) and not isinstance(
--> 810             value[0], (str, bytes)
    811         ):
    812             logger.warning(

~/opt/miniconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in __getitem__(self, key)
   3928         if is_scalar(key):
   3929             key = com.cast_scalar_indexer(key)
-> 3930             return getitem(key)
   3931 
   3932         if isinstance(key, slice):

IndexError: index 0 is out of bounds for axis 0 with size 0

Thanks

jlakkis commented 4 years ago

Hi Bapoorva,

  1. Can you run "print(adata)" before calling CarDEC_API? This will help give me a sense for what may be going on with the dataset.
  2. Can you show me the code defining the function "read_macaque" ?
bapoorva commented 4 years ago

Sure. So this is the function from the jupyter notebook

def read_macaque(path):
    """A function to read and preprocess the macaque data"""
    adata = sc.read(path)
    sc.pp.filter_cells(adata, min_genes=0)
    sc.pp.filter_genes(adata, min_cells=30)

    adata = adata[adata.obs['n_genes'] < 2500, :]

    return(adata)

and here is my output

adata = read_macaque("/Users/bapoorva/Desktop/Cardec/wt_i73t.h5ad")
print(adata)

View of AnnData object with n_obs × n_vars = 1876 × 1714
    obs: 'sample', 'n_genes', 'n_genes_by_counts', 'total_counts', 'total_counts_mt', 'pct_counts_mt', 'leiden', 'cluster'
    var: 'gene_ids', 'n_cells', 'mt', 'n_cells_by_counts', 'mean_counts', 'pct_dropout_by_counts', 'total_counts', 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'mean', 'std'
    uns: 'leiden', 'leiden_colors', 'neighbors', 'pca', 'sample_colors', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    obsp: 'connectivities', 'distances'
jlakkis commented 4 years ago

Things look good so far. Can you run the following code snippet? Then, after you run it, can you show me the output of "print(adata)" and "print(obs_.shape)"

from scipy.sparse import issparse
from anndata import AnnData
from CarDEC.CarDEC_utils import convert_vector_to_encoding

batch_key = 'sample'
n_high_var = 2000
LVG = True, 
normalize_samples = True
log_normalize = True, 
normalize_features = True

n, p = adata.shape
sparsemode = issparse(adata.X)

if batch_key is not None:
    batch = list(adata.obs[batch_key])
    batch = convert_vector_to_encoding(batch)
    batch = np.asarray(batch)
    batch = batch.astype('float32')
else:
    batch = np.ones((n,), dtype = 'float32')
    norm_by_batch = False

sc.pp.filter_genes(adata, min_counts=1)
sc.pp.filter_cells(adata, min_counts=1)

count = adata.X.copy()

if normalize_samples:
    out = sc.pp.normalize_total(adata, inplace = False)
    obs_ = adata.obs
    var_ = adata.var
    adata = None
    adata = AnnData(out['X'])
bapoorva commented 4 years ago

I got this

print(adata)
print(obs_.shape)

AnnData object with n_obs × n_vars = 0 × 0
(0, 9)