Closed SuhanG17 closed 4 years ago
Did you get an idea how to solve this problem? I encountered the same problem.
Here is the version info:
WARNING: If you miss a compact list, please try `print_header`!
-----
anndata 0.7.4
scanpy 1.6.0
sinfo 0.3.1
-----
OpenSSL 19.1.0
PIL 7.2.0
anndata 0.7.4
appdirs 1.4.4
attr 20.2.0
backcall 0.2.0
bioservices 1.7.9
brotli NA
bs4 4.9.1
certifi 2020.06.20
cffi 1.14.1
chardet 3.0.4
colorama 0.4.3
colorlog NA
cryptography 3.1
cycler 0.10.0
cython_runtime NA
dateutil 2.8.1
decorator 4.4.2
easydev 0.10.1
get_version 2.1
gseapy 0.10.1
h5py 2.10.0
idna 2.10
ipykernel 5.3.4
ipython_genutils 0.2.0
jedi 0.15.2
jinja2 2.11.2
joblib 0.16.0
jsonschema 3.2.0
kiwisolver 1.2.0
legacy_api_wrap 1.2
llvmlite 0.34.0
lxml 4.5.2
markupsafe 1.1.1
matplotlib 3.3.2
mpl_toolkits NA
natsort 7.0.1
nbformat 5.0.7
networkx 2.5
numba 0.51.2
numexpr 2.7.1
numpy 1.19.2
packaging 20.4
pandas 1.0.1
parso 0.5.2
pexpect 4.8.0
pickleshare 0.7.5
pkg_resources NA
prometheus_client NA
prompt_toolkit 3.0.7
ptyprocess 0.6.0
pvectorc NA
pygments 2.7.0
pylab NA
pyparsing 2.4.7
pyrsistent NA
pytz 2020.1
requests 2.23.0
requests_cache 0.5.2
scanpy 1.6.0
scipy 1.5.2
seaborn 0.11.0
send2trash NA
setuptools_scm NA
sinfo 0.3.1
six 1.15.0
sklearn 0.23.2
socks 1.7.1
soupsieve 2.0.1
statsmodels 0.12.0
storemagic NA
tables 3.6.1
terminado 0.8.3
tornado 6.0.4
traitlets 5.0.4
urllib3 1.25.10
wcwidth 0.2.5
wrapt 1.12.1
xlsxwriter 1.3.3
zmq 19.0.2
-----
IPython 7.18.1
jupyter_client 6.1.7
jupyter_core 4.6.3
jupyterlab 2.2.8
notebook 6.1.4
-----
Python 3.8.5 | packaged by conda-forge | (default, Aug 29 2020, 01:22:49) [GCC 7.5.0]
Linux-4.4.0-142-generic-x86_64-with-glibc2.10
64 logical CPU cores, x86_64
-----
Session information updated at 2020-09-16 11:03
Here is the error message:
---------------------------------------------------------------------------
InvalidIndexError Traceback (most recent call last)
<ipython-input-37-b22ada65a1cd> in <module>
1 # Create Concatenated anndata object for all timepoints
2 #alldays = e125.concatenate(e135, e145, e155, uns_merge="unique")
----> 3 alldays = e125.concatenate(e135)
~/miniconda3/envs/env4sc_velo_scannpy/lib/python3.8/site-packages/anndata/_core/anndata.py in concatenate(self, join, batch_key, batch_categories, uns_merge, index_unique, fill_value, *adatas)
1696 all_adatas = (self,) + tuple(adatas)
1697
-> 1698 out = concat(
1699 all_adatas,
1700 axis=0,
~/miniconda3/envs/env4sc_velo_scannpy/lib/python3.8/site-packages/anndata/_core/merge.py in concat(adatas, axis, join, merge, uns_merge, label, keys, index_unique, fill_value, pairwise)
799 [dim_indices(a, axis=1 - axis) for a in adatas], join=join
800 )
--> 801 reindexers = [
802 gen_reindexer(alt_indices, dim_indices(a, axis=1 - axis)) for a in adatas
803 ]
~/miniconda3/envs/env4sc_velo_scannpy/lib/python3.8/site-packages/anndata/_core/merge.py in <listcomp>(.0)
800 )
801 reindexers = [
--> 802 gen_reindexer(alt_indices, dim_indices(a, axis=1 - axis)) for a in adatas
803 ]
804
~/miniconda3/envs/env4sc_velo_scannpy/lib/python3.8/site-packages/anndata/_core/merge.py in gen_reindexer(new_var, cur_var)
393 [1., 0., 0.]], dtype=float32)
394 """
--> 395 return Reindexer(cur_var, new_var)
396
397
~/miniconda3/envs/env4sc_velo_scannpy/lib/python3.8/site-packages/anndata/_core/merge.py in __init__(self, old_idx, new_idx)
265 self.no_change = new_idx.equals(old_idx)
266
--> 267 new_pos = new_idx.get_indexer(old_idx)
268 old_pos = np.arange(len(new_pos))
269
~/miniconda3/envs/env4sc_velo_scannpy/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
2731
2732 if not self.is_unique:
-> 2733 raise InvalidIndexError(
2734 "Reindexing only valid with uniquely valued Index objects"
2735 )
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Thanks.
@xie186 updating pandas should fix the problem.
@Koncopd Thanks for your response. Updating pandas to 1.1.2 doesn't work in my case. I'm wondering whether there is a docker image or Dockerfile for the latest version from your lab.
To make you reproduce my error, here is what I did with a Docker image:
singularity -d build sc_velocyto0.17.17_scannpy1.6.0_scvelo0.2.2.img docker://xie186/scrna_tutorial:0.1.0
sudo ufw allow 8689
singularity run --nv sc_velocyto0.17.17_scannpy1.6.0_scvelo0.2.2.img jupyter-lab --ip=<your_ip> --port=8689 --no-browser
Then in browser, open '
The data and code I ran is shown as below:
Go to: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE132188
Search "GSE132188_RAW.tar"
Download the file. Uncompress the files:
tar xvf GSE132188_RAW.tar
You will get the following files.
GSM3852752_E12_5_counts.tar.gz
GSM3852753_E13_5_counts.tar.gz
GSM3852754_E14_5_counts.tar.gz
GSM3852755_E15_5_counts.tar.gz
Uncompress:
mkdir -p E12_5_counts/
tar zxvf GSM3852752_E12_5_counts.tar.gz --directory E12_5_counts/
mkdir -p E13_5_counts/
tar zxvf GSM3852753_E13_5_counts.tar.gz --directory E13_5_counts/
mkdir -p E14_5_counts/
tar zxvf GSM3852754_E14_5_counts.tar.gz --directory E14_5_counts/
mkdir -p E15_5_counts/
tar zxvf GSM3852755_E15_5_counts.tar.gz --directory E15_5_counts/
Code:
import numpy as np
import matplotlib.pyplot as pl
import numpy as np
import scanpy as sc
import scanpy.external as sce
import pandas as pd
from anndata import AnnData
import seaborn as sns
from scipy.sparse import csr_matrix
import networkx as nx
import xlsxwriter
from matplotlib import rcParams
import seaborn as sns
import scipy as sci
#GSEApy: Gene Set Enrichment Analysis in Python.
#import gseapy as gp
sc.settings.verbosity = 3
sc.logging.print_versions()
# Read cellranger files for all four samples
filename = './E12_5_counts/mm10/matrix.mtx'
filename_genes = './E12_5_counts/mm10/genes.tsv'
filename_barcodes = './E12_5_counts/mm10/barcodes.tsv'
e125 = sc.read(filename).transpose()
e125.var_names = np.genfromtxt(filename_genes, dtype=str)[:, 1]
e125.obs_names = np.genfromtxt(filename_barcodes, dtype=str)
filename = './E13_5_counts/mm10/matrix.mtx'
filename_genes = './E13_5_counts/mm10/genes.tsv'
filename_barcodes = './E13_5_counts/mm10/barcodes.tsv'
e135 = sc.read(filename).transpose()
e135.var_names = np.genfromtxt(filename_genes, dtype=str)[:, 1]
e135.obs_names = np.genfromtxt(filename_barcodes, dtype=str)
filename = './E14_5_counts/mm10/matrix.mtx'
filename_genes = './E14_5_counts/mm10/genes.tsv'
filename_barcodes = './E14_5_counts/mm10/barcodes.tsv'
e145 = sc.read(filename).transpose()
e145.var_names = np.genfromtxt(filename_genes, dtype=str)[:, 1]
e145.obs_names = np.genfromtxt(filename_barcodes, dtype=str)
filename = './E15_5_counts/mm10/matrix.mtx'
filename_genes = './E15_5_counts/mm10/genes.tsv'
filename_barcodes = './E15_5_counts/mm10/barcodes.tsv'
e155 = sc.read(filename).transpose()
e155.var_names = np.genfromtxt(filename_genes, dtype=str)[:, 1]
e155.obs_names = np.genfromtxt(filename_barcodes, dtype=str)
# Add dev. timepoint label for each sample
e125.obs['day'] = '12.5'
e135.obs['day'] = '13.5'
e145.obs['day'] = '14.5'
e155.obs['day'] = '15.5'
alldays = e125.concatenate(e135, e145, e155)
@xie186, are the variable names within each of your objects are unique?
I would guess that would be the problem. Here's a simple case that would throw this error:
import anndata as ad, numpy as np, pandas as pd
a = ad.AnnData(np.ones((3, 2)), var=pd.DataFrame(index=["a", "a"]))
b = ad.AnnData(np.ones((3, 3)), var=pd.DataFrame(index=["a", "b", "c"]))
a.concatenate(b)
I think our merge operation for the variables is only well defined if variable names are unique with each of the objects. I'm not sure there's a good default result here other than throwing an error.
I'll make sure they are unique and try again. I'm following the note here: https://github.com/theislab/pancreatic-endocrinogenesis/blob/master/scRNA_seq_qc_preprocessing_clustering.ipynb
@ivirshup Indeed, there are 48 indexes that are not unique which is weird to me. In the link I gave above, the authors didn't have this problem. Maybe it's because of the version differences. Thanks.
adata.obs_names_make_unique()
should help you here before concatenating.
@LuckyMD Thanks for your comments. Do you have an idea why we have duplicate IDs here? Shouldn't Gene ID be unique?
Ah, I missed that it's gene IDs and not obs ids. In that case it's var_names_make_unique()
you need. I'm not sure why you would have duplicate genes. Sometimes it can be due to signs that aren't read in properly, i.e. maybe you have gene names ABC-1
and ABC-1
with a space or some other character?
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Comment
I was just going over the GSE92332 case study. On Section 2, the AnnData objects were created fine, but the concatenation method for AnnData object did not work out for me. I browsed through issues raised, there were issues on how to concatenate objects, but they did not seem to answer my question here.
Minimal code sample (that we can copy&paste without having any data)
Versions