Closed SuhanG17 closed 4 years ago
Hi @SuhanG17,
It looks like the indices in your indices aren't unique already before concatenation. Could you uncomment the line adata_tmp.obs_names_make_unique()
and see if that works?
Hi @ @LuckyMD,
thank you for your advice. I tried uncommented adata_tmp.obs_names_make_unique()
under the current environment (scanpy == 1.6.0
). Unfortunately, the error was the same. But, I tried creating another environment with scanpy==1.4.6 anndata==0.7.1 umap==0.4.6 numpy==1.19.1 scipy==1.4.1 pandas==1.0.3 scikit-learn==0.22.2.post1 statsmodels==0.11.1 python-igraph==0.8.0 louvain==0.6.1
and the same scripts I posted above worked out just fine. I think the concatenation func might have some minor update in scanpy or AnnData package that we failed to keep up.
Just in case you could use this information, I posted the code I tried here
#Annotate data
barcodes_tmp.rename(columns={0:'barcode'}, inplace=True)
barcodes_tmp.set_index('barcode', inplace=True)
adata_tmp.obs = barcodes_tmp
adata_tmp.obs['sample'] = [sample]*adata_tmp.n_obs
adata_tmp.obs['region'] = [sample.split("_")[0]]*adata_tmp.n_obs
adata_tmp.obs['donor'] = [sample.split("_")[1]]*adata_tmp.n_obs
adata_tmp.obs_names_make_unique()
genes_tmp.rename(columns={0:'gene_id', 1:'gene_symbol'}, inplace=True)
genes_tmp.set_index('gene_symbol', inplace=True)
adata_tmp.var = genes_tmp
adata_tmp.var_names_make_unique()
# Concatenate to main adata object
adata = adata.concatenate(adata_tmp, batch_key='sample_id')
# adata.var['gene_id'] = adata.var['gene_id-1']
# adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)
adata.obs.drop(columns=['sample_id'], inplace=True)
adata.obs_names = [c.split("-")[0] for c in adata.obs_names]
adata.obs_names_make_unique(join='_')
And the error message:
... reading from cache file cache/..-data-Haber-et-al_mouse-intestinal-epithelium-GSE92332_RAW-GSM2836574_Regional_Duo_M2_matrix.h5ad
---------------------------------------------------------------------------
InvalidIndexError Traceback (most recent call last)
<ipython-input-6-80198fad391d> in <module>
31
32 # Concatenate to main adata object
---> 33 adata = adata.concatenate(adata_tmp, batch_key='sample_id')
34 # adata.var['gene_id'] = adata.var['gene_id-1']
35 # adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)
~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/anndata.py in concatenate(self, join, batch_key, batch_categories, uns_merge, index_unique, fill_value, *adatas)
1696 all_adatas = (self,) + tuple(adatas)
1697
-> 1698 out = concat(
1699 all_adatas,
1700 axis=0,
~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/merge.py in concat(adatas, axis, join, merge, uns_merge, label, keys, index_unique, fill_value, pairwise)
799 [dim_indices(a, axis=1 - axis) for a in adatas], join=join
800 )
--> 801 reindexers = [
802 gen_reindexer(alt_indices, dim_indices(a, axis=1 - axis)) for a in adatas
803 ]
~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/merge.py in <listcomp>(.0)
800 )
801 reindexers = [
--> 802 gen_reindexer(alt_indices, dim_indices(a, axis=1 - axis)) for a in adatas
803 ]
804
~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/merge.py in gen_reindexer(new_var, cur_var)
393 [1., 0., 0.]], dtype=float32)
394 """
--> 395 return Reindexer(cur_var, new_var)
396
397
~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/anndata/_core/merge.py in __init__(self, old_idx, new_idx)
265 self.no_change = new_idx.equals(old_idx)
266
--> 267 new_pos = new_idx.get_indexer(old_idx)
268 old_pos = np.arange(len(new_pos))
269
~/anaconda3/envs/sc-tutorial/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
2978
2979 if not self.is_unique:
-> 2980 raise InvalidIndexError(
2981 "Reindexing only valid with uniquely valued Index objects"
2982 )
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Hi @SuhanG17,
I'm unaware of an update that was backwards breaking for concatenation in the two packages. I would assume it has more to do with the pandas
version. I think there might have been larger changes in version 1.1. I'm glad it works for you in the older environment though.
@LuckyMD I agree with you. I'll look into the version update for pandas
and post here if I got any clue. For now, I'll just go with the older environment. Thanks a lot!
I'm sorry to refer to this closed issue again, but commenting out the two lines as in #41
#adata.var['gene_id'] = adata.var['gene_id-1']
;#adata.var.drop(columns = ['gene_id-1', 'gene_id-0'], inplace=True)
didn't seem to work for me. Could it be caused by the fact that I'm using scanpy==1.6.0 and annData==0.7.4?Here is my code for this cell:
And the error message persists like this:
Version of packages:
versions