theislab / single-cell-tutorial

Single cell current best practices tutorial case study for the paper:Luecken and Theis, "Current best practices in single-cell RNA-seq analysis: a tutorial"
1.38k stars 455 forks source link

concatenation issue persists after commenting out two lines as instructed in issue #41 #44

Closed SuhanG17 closed 4 years ago

SuhanG17 commented 4 years ago

I'm sorry to refer to this closed issue again, but commenting out the two lines as in #41 #adata.var['gene_id'] = adata.var['gene_id-1']; #adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True) didn't seem to work for me. Could it be caused by the fact that I'm using scanpy==1.6.0 and annData==0.7.4?

Here is my code for this cell:

 #Annotate data
    barcodes_tmp.rename(columns={0:'barcode'}, inplace=True)
    barcodes_tmp.set_index('barcode', inplace=True)
    adata_tmp.obs = barcodes_tmp
    adata_tmp.obs['sample'] = [sample]*adata_tmp.n_obs
    adata_tmp.obs['region'] = [sample.split("_")[0]]*adata_tmp.n_obs
    adata_tmp.obs['donor'] = [sample.split("_")[1]]*adata_tmp.n_obs
#     adata_tmp.obs_names_make_unique()

    genes_tmp.rename(columns={0:'gene_id', 1:'gene_symbol'}, inplace=True)
    genes_tmp.set_index('gene_symbol', inplace=True)
    adata_tmp.var = genes_tmp
    adata_tmp.var_names_make_unique()

    # Concatenate to main adata object
    adata = adata.concatenate(adata_tmp, batch_key='sample_id')
    #adata.var['gene_id'] = adata.var['gene_id-1']
    #adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)
    adata.obs.drop(columns=['sample_id'], inplace=True)
    adata.obs_names = [c.split("-")[0] for c in adata.obs_names]
    adata.obs_names_make_unique(join='_')

And the error message persists like this:

... reading from cache file cache/..-data-Haber-et-al_mouse-intestinal-epithelium-GSE92332_RAW-GSM2836574_Regional_Duo_M2_matrix.h5ad

---------------------------------------------------------------------------
InvalidIndexError                         Traceback (most recent call last)
<ipython-input-8-68186e73aaae> in <module>
     31 
     32     # Concatenate to main adata object
---> 33     adata = adata.concatenate(adata_tmp, batch_key='sample_id')
     34     #adata.var['gene_id'] = adata.var['gene_id-1']
     35     #adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/anndata.py in concatenate(self, join, batch_key, batch_categories, uns_merge, index_unique, fill_value, *adatas)
   1696         all_adatas = (self,) + tuple(adatas)
   1697 
-> 1698         out = concat(
   1699             all_adatas,
   1700             axis=0,

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/merge.py in concat(adatas, axis, join, merge, uns_merge, label, keys, index_unique, fill_value, pairwise)
    799         [dim_indices(a, axis=1 - axis) for a in adatas], join=join
    800     )
--> 801     reindexers = [
    802         gen_reindexer(alt_indices, dim_indices(a, axis=1 - axis)) for a in adatas
    803     ]

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/merge.py in <listcomp>(.0)
    800     )
    801     reindexers = [
--> 802         gen_reindexer(alt_indices, dim_indices(a, axis=1 - axis)) for a in adatas
    803     ]
    804 

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/merge.py in gen_reindexer(new_var, cur_var)
    393            [1., 0., 0.]], dtype=float32)
    394     """
--> 395     return Reindexer(cur_var, new_var)
    396 
    397 

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/merge.py in __init__(self, old_idx, new_idx)
    265         self.no_change = new_idx.equals(old_idx)
    266 
--> 267         new_pos = new_idx.get_indexer(old_idx)
    268         old_pos = np.arange(len(new_pos))
    269 

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
   2978 
   2979         if not self.is_unique:
-> 2980             raise InvalidIndexError(
   2981                 "Reindexing only valid with uniquely valued Index objects"
   2982             )

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Version of packages:

versions

-----
anndata     0.7.4
scanpy      1.6.0
sinfo       0.3.1
-----
PIL                 7.2.0
anndata             0.7.4
anndata2ri          1.0.4
attr                20.2.0
backcall            0.2.0
cairo               1.19.1
certifi             2020.06.20
cffi                1.14.1
chardet             3.0.4
cycler              0.10.0
cython_runtime      NA
dateutil            2.8.1
decorator           4.4.2
get_version         2.1
gprofiler           1.0.0
h5py                2.10.0
idna                2.10
igraph              0.8.2
ipykernel           5.3.4
ipython_genutils    0.2.0
ipywidgets          7.5.1
jedi                0.17.2
jinja2              2.11.2
joblib              0.16.0
jsonschema          3.2.0
kiwisolver          1.2.0
legacy_api_wrap     1.2
llvmlite            0.34.0
louvain             0.6.1
markupsafe          1.1.1
matplotlib          3.3.1
mpl_toolkits        NA
natsort             7.0.1
nbformat            5.0.7
numba               0.51.2
numexpr             2.7.1
numpy               1.19.1
packaging           20.4
pandas              1.1.1
parso               0.7.1
pexpect             4.8.0
pickleshare         0.7.5
pkg_resources       NA
prometheus_client   NA
prompt_toolkit      3.0.7
ptyprocess          0.6.0
pvectorc            NA
pygments            2.6.1
pyparsing           2.4.7
pyrsistent          NA
pytz                2020.1
requests            2.24.0
rpy2                3.3.5
scanpy              1.6.0
scipy               1.5.2
seaborn             0.10.1
send2trash          NA
setuptools_scm      NA
sinfo               0.3.1
six                 1.15.0
sklearn             0.23.2
statsmodels         0.12.0
storemagic          NA
tables              3.6.1
terminado           0.8.3
texttable           1.6.3
tornado             6.0.4
traitlets           4.3.3
tzlocal             NA
urllib3             1.25.10
wcwidth             0.2.5
yaml                5.3.1
zmq                 19.0.2
-----
IPython             7.18.1
jupyter_client      6.1.7
jupyter_core        4.6.3
notebook            6.1.3
-----
Python 3.8.5 | packaged by conda-forge | (default, Aug 29 2020, 01:22:49) [GCC 7.5.0]
Linux-4.13.0-36-generic-x86_64-with-glibc2.10
4 logical CPU cores, x86_64
-----
Session information updated at 2020-09-09 10:48;
LuckyMD commented 4 years ago

Hi @SuhanG17, It looks like the indices in your indices aren't unique already before concatenation. Could you uncomment the line adata_tmp.obs_names_make_unique() and see if that works?

SuhanG17 commented 4 years ago

Hi @ @LuckyMD,

thank you for your advice. I tried uncommented adata_tmp.obs_names_make_unique() under the current environment (scanpy == 1.6.0). Unfortunately, the error was the same. But, I tried creating another environment with scanpy==1.4.6 anndata==0.7.1 umap==0.4.6 numpy==1.19.1 scipy==1.4.1 pandas==1.0.3 scikit-learn==0.22.2.post1 statsmodels==0.11.1 python-igraph==0.8.0 louvain==0.6.1 and the same scripts I posted above worked out just fine. I think the concatenation func might have some minor update in scanpy or AnnData package that we failed to keep up.


Just in case you could use this information, I posted the code I tried here

   #Annotate data
    barcodes_tmp.rename(columns={0:'barcode'}, inplace=True)
    barcodes_tmp.set_index('barcode', inplace=True)
    adata_tmp.obs = barcodes_tmp
    adata_tmp.obs['sample'] = [sample]*adata_tmp.n_obs
    adata_tmp.obs['region'] = [sample.split("_")[0]]*adata_tmp.n_obs
    adata_tmp.obs['donor'] = [sample.split("_")[1]]*adata_tmp.n_obs
    adata_tmp.obs_names_make_unique()

    genes_tmp.rename(columns={0:'gene_id', 1:'gene_symbol'}, inplace=True)
    genes_tmp.set_index('gene_symbol', inplace=True)
    adata_tmp.var = genes_tmp
    adata_tmp.var_names_make_unique()

    # Concatenate to main adata object
    adata = adata.concatenate(adata_tmp, batch_key='sample_id')
#     adata.var['gene_id'] = adata.var['gene_id-1']
#     adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)
    adata.obs.drop(columns=['sample_id'], inplace=True)
    adata.obs_names = [c.split("-")[0] for c in adata.obs_names]
    adata.obs_names_make_unique(join='_')

And the error message:

... reading from cache file cache/..-data-Haber-et-al_mouse-intestinal-epithelium-GSE92332_RAW-GSM2836574_Regional_Duo_M2_matrix.h5ad

---------------------------------------------------------------------------
InvalidIndexError                         Traceback (most recent call last)
<ipython-input-6-80198fad391d> in <module>
     31 
     32     # Concatenate to main adata object
---> 33     adata = adata.concatenate(adata_tmp, batch_key='sample_id')
     34 #     adata.var['gene_id'] = adata.var['gene_id-1']
     35 #     adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/anndata.py in concatenate(self, join, batch_key, batch_categories, uns_merge, index_unique, fill_value, *adatas)
   1696         all_adatas = (self,) + tuple(adatas)
   1697 
-> 1698         out = concat(
   1699             all_adatas,
   1700             axis=0,

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/merge.py in concat(adatas, axis, join, merge, uns_merge, label, keys, index_unique, fill_value, pairwise)
    799         [dim_indices(a, axis=1 - axis) for a in adatas], join=join
    800     )
--> 801     reindexers = [
    802         gen_reindexer(alt_indices, dim_indices(a, axis=1 - axis)) for a in adatas
    803     ]

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/merge.py in <listcomp>(.0)
    800     )
    801     reindexers = [
--> 802         gen_reindexer(alt_indices, dim_indices(a, axis=1 - axis)) for a in adatas
    803     ]
    804 

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/merge.py in gen_reindexer(new_var, cur_var)
    393            [1., 0., 0.]], dtype=float32)
    394     """
--> 395     return Reindexer(cur_var, new_var)
    396 
    397 

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/merge.py in __init__(self, old_idx, new_idx)
    265         self.no_change = new_idx.equals(old_idx)
    266 
--> 267         new_pos = new_idx.get_indexer(old_idx)
    268         old_pos = np.arange(len(new_pos))
    269 

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
   2978 
   2979         if not self.is_unique:
-> 2980             raise InvalidIndexError(
   2981                 "Reindexing only valid with uniquely valued Index objects"
   2982             )

InvalidIndexError: Reindexing only valid with uniquely valued Index objects
LuckyMD commented 4 years ago

Hi @SuhanG17,

I'm unaware of an update that was backwards breaking for concatenation in the two packages. I would assume it has more to do with the pandas version. I think there might have been larger changes in version 1.1. I'm glad it works for you in the older environment though.

SuhanG17 commented 4 years ago

@LuckyMD I agree with you. I'll look into the version update for pandas and post here if I got any clue. For now, I'll just go with the older environment. Thanks a lot!