Single cell current best practices tutorial case study for the paper:Luecken and Theis, "Current best practices in single-cell RNA-seq analysis: a tutorial"
concatenation issue persists after commenting out two lines as instructed in issue #41 #44

Closed SuhanG17 closed 4 years ago

SuhanG17 commented 4 years ago

I'm sorry to refer to this closed issue again, but commenting out the two lines as in #41 #adata.var['gene_id'] = adata.var['gene_id-1']; #adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True) didn't seem to work for me. Could it be caused by the fact that I'm using scanpy==1.6.0 and annData==0.7.4?

Here is my code for this cell:

 #Annotate data
    barcodes_tmp.rename(columns={0:'barcode'}, inplace=True)
    barcodes_tmp.set_index('barcode', inplace=True)
    adata_tmp.obs = barcodes_tmp
    adata_tmp.obs['sample'] = [sample]*adata_tmp.n_obs
    adata_tmp.obs['region'] = [sample.split("_")[0]]*adata_tmp.n_obs
    adata_tmp.obs['donor'] = [sample.split("_")[1]]*adata_tmp.n_obs
#     adata_tmp.obs_names_make_unique()

    genes_tmp.rename(columns={0:'gene_id', 1:'gene_symbol'}, inplace=True)
    genes_tmp.set_index('gene_symbol', inplace=True)
    adata_tmp.var = genes_tmp

    # Concatenate to main adata object
    adata = adata.concatenate(adata_tmp, batch_key='sample_id')
    #adata.var['gene_id'] = adata.var['gene_id-1']
    #adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)
    adata.obs.drop(columns=['sample_id'], inplace=True)
    adata.obs_names = [c.split("-")[0] for c in adata.obs_names]

And the error message persists like this:

... reading from cache file cache/..-data-Haber-et-al_mouse-intestinal-epithelium-GSE92332_RAW-GSM2836574_Regional_Duo_M2_matrix.h5ad

InvalidIndexError                         Traceback (most recent call last)
<ipython-input-8-68186e73aaae> in <module>
     32     # Concatenate to main adata object
---> 33     adata = adata.concatenate(adata_tmp, batch_key='sample_id')
     34     #adata.var['gene_id'] = adata.var['gene_id-1']
     35     #adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/ in concatenate(self, join, batch_key, batch_categories, uns_merge, index_unique, fill_value, *adatas)
   1696         all_adatas = (self,) + tuple(adatas)
-> 1698         out = concat(
   1699             all_adatas,
   1700             axis=0,

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/ in concat(adatas, axis, join, merge, uns_merge, label, keys, index_unique, fill_value, pairwise)
    799         [dim_indices(a, axis=1 - axis) for a in adatas], join=join
    800     )
--> 801     reindexers = [
    802         gen_reindexer(alt_indices, dim_indices(a, axis=1 - axis)) for a in adatas
    803     ]

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/ in <listcomp>(.0)
    800     )
    801     reindexers = [
--> 802         gen_reindexer(alt_indices, dim_indices(a, axis=1 - axis)) for a in adatas
    803     ]

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/ in gen_reindexer(new_var, cur_var)
    393            [1., 0., 0.]], dtype=float32)
    394     """
--> 395     return Reindexer(cur_var, new_var)

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/ in __init__(self, old_idx, new_idx)
    265         self.no_change = new_idx.equals(old_idx)
--> 267         new_pos = new_idx.get_indexer(old_idx)
    268         old_pos = np.arange(len(new_pos))

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/pandas/core/indexes/ in get_indexer(self, target, method, limit, tolerance)
   2979         if not self.is_unique:
-> 2980             raise InvalidIndexError(
   2981                 "Reindexing only valid with uniquely valued Index objects"
   2982             )

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Version of packages:


LuckyMD commented 4 years ago

Hi @SuhanG17, It looks like the indices in your indices aren't unique already before concatenation. Could you uncomment the line adata_tmp.obs_names_make_unique() and see if that works?

SuhanG17 commented 4 years ago

Hi @ @LuckyMD,

thank you for your advice. I tried uncommented adata_tmp.obs_names_make_unique() under the current environment (scanpy == 1.6.0). Unfortunately, the error was the same. But, I tried creating another environment with scanpy==1.4.6 anndata==0.7.1 umap==0.4.6 numpy==1.19.1 scipy==1.4.1 pandas==1.0.3 scikit-learn==0.22.2.post1 statsmodels==0.11.1 python-igraph==0.8.0 louvain==0.6.1 and the same scripts I posted above worked out just fine. I think the concatenation func might have some minor update in scanpy or AnnData package that we failed to keep up.

Just in case you could use this information, I posted the code I tried here

   #Annotate data
    barcodes_tmp.rename(columns={0:'barcode'}, inplace=True)
    barcodes_tmp.set_index('barcode', inplace=True)
    adata_tmp.obs = barcodes_tmp
    adata_tmp.obs['sample'] = [sample]*adata_tmp.n_obs
    adata_tmp.obs['region'] = [sample.split("_")[0]]*adata_tmp.n_obs
    adata_tmp.obs['donor'] = [sample.split("_")[1]]*adata_tmp.n_obs

    genes_tmp.rename(columns={0:'gene_id', 1:'gene_symbol'}, inplace=True)
    genes_tmp.set_index('gene_symbol', inplace=True)
    adata_tmp.var = genes_tmp

    # Concatenate to main adata object
    adata = adata.concatenate(adata_tmp, batch_key='sample_id')
#     adata.var['gene_id'] = adata.var['gene_id-1']
#     adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)
    adata.obs.drop(columns=['sample_id'], inplace=True)
    adata.obs_names = [c.split("-")[0] for c in adata.obs_names]

And the error message:

... reading from cache file cache/..-data-Haber-et-al_mouse-intestinal-epithelium-GSE92332_RAW-GSM2836574_Regional_Duo_M2_matrix.h5ad

InvalidIndexError                         Traceback (most recent call last)
<ipython-input-6-80198fad391d> in <module>
     32     # Concatenate to main adata object
---> 33     adata = adata.concatenate(adata_tmp, batch_key='sample_id')
     34 #     adata.var['gene_id'] = adata.var['gene_id-1']
     35 #     adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/ in concatenate(self, join, batch_key, batch_categories, uns_merge, index_unique, fill_value, *adatas)
   1696         all_adatas = (self,) + tuple(adatas)
-> 1698         out = concat(
   1699             all_adatas,
   1700             axis=0,

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/ in concat(adatas, axis, join, merge, uns_merge, label, keys, index_unique, fill_value, pairwise)
    799         [dim_indices(a, axis=1 - axis) for a in adatas], join=join
    800     )
--> 801     reindexers = [
    802         gen_reindexer(alt_indices, dim_indices(a, axis=1 - axis)) for a in adatas
    803     ]

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/ in <listcomp>(.0)
    800     )
    801     reindexers = [
--> 802         gen_reindexer(alt_indices, dim_indices(a, axis=1 - axis)) for a in adatas
    803     ]

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/ in gen_reindexer(new_var, cur_var)
    393            [1., 0., 0.]], dtype=float32)
    394     """
--> 395     return Reindexer(cur_var, new_var)

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/ in __init__(self, old_idx, new_idx)
    265         self.no_change = new_idx.equals(old_idx)
--> 267         new_pos = new_idx.get_indexer(old_idx)
    268         old_pos = np.arange(len(new_pos))

~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/pandas/core/indexes/ in get_indexer(self, target, method, limit, tolerance)
   2979         if not self.is_unique:
-> 2980             raise InvalidIndexError(
   2981                 "Reindexing only valid with uniquely valued Index objects"
   2982             )

InvalidIndexError: Reindexing only valid with uniquely valued Index objects
LuckyMD commented 4 years ago

Hi @SuhanG17,

I'm unaware of an update that was backwards breaking for concatenation in the two packages. I would assume it has more to do with the pandas version. I think there might have been larger changes in version 1.1. I'm glad it works for you in the older environment though.

SuhanG17 commented 4 years ago

@LuckyMD I agree with you. I'll look into the version update for pandas and post here if I got any clue. For now, I'll just go with the older environment. Thanks a lot!