maximilianh / cellBrowser

main repo: https://github.com/ucscGenomeBrowser/cellBrowser/ - Python pipeline and Javascript scatter plot library for single-cell datasets, http://cellbrowser.rtfd.org
https://github.com/ucscGenomeBrowser/cellBrowser/
GNU General Public License v3.0
105 stars 42 forks source link

cbScanpy failing with error: "ValueError: cannot reindex with a non-unique indexer" #66

Closed matthewspeir closed 5 years ago

matthewspeir commented 5 years ago

Command, cbScanpy output and error at the bottom:

$ cbScanpy -e ica_cord_blood_h5.h5 -o cbScanpyOut -n ICA_Cord_Blood
INFO:root:Creating cbScanpyOut
cbScanpy $Id$
Input file: ica_cord_blood_h5.h5
Start time: 2019-01-29 12:45:38.211369
scanpy==1.3.7 anndata==0.6.18 numpy==1.16.0 scipy==1.2.0 pandas==0.24.0 scikit-learn==0.20.2 statsmodels==0.9.0 
INFO:root:Loading expression matrix: 10X h5 format
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
INFO:root:Writing scanpy matrix to cbScanpyOut/exprMatrix.tsv.gz
INFO:root:Transposing matrix
INFO:root:Converting csc matrix to row-sparse matrix
INFO:root:Writing gene-by-gene, without using pandas
INFO:root:Writing 33694 genes in total
INFO:root:Wrote 0 genes
INFO:root:Wrote 2000 genes
INFO:root:Wrote 4000 genes
INFO:root:Wrote 6000 genes
INFO:root:Wrote 8000 genes
INFO:root:Wrote 10000 genes
INFO:root:Wrote 12000 genes
INFO:root:Wrote 14000 genes
INFO:root:Wrote 16000 genes
INFO:root:Wrote 18000 genes
INFO:root:Wrote 20000 genes
INFO:root:Wrote 22000 genes
INFO:root:Wrote 24000 genes
INFO:root:Wrote 26000 genes
INFO:root:Wrote 28000 genes
INFO:root:Wrote 30000 genes
INFO:root:Wrote 32000 genes
Data has 384000 samples/observations
Data has 33694 genes/variables
Basic filtering: keep only cells with min 200 genes
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Basic filtering: keep only gene with min 3 cells
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
After filtering: Data has 320644 samples/observations and 24248 genes/variables
INFO:root:'geneIdType' is not specified in config file.
INFO:root:Auto-detected gene IDs type: symbols
Remove cells with more than 0.050000 percent of mitochondrial genes
Computing percentage of mitochondrial genes
Remove cells with less than 10 and more than 15000 genes
Filtering cells
After filtering: Data has 273034 samples/observations and 24248 genes/variables
Expression normalization, counts per cell = 10000
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Finding highly variable genes: min_mean=0.012500, max_mean=3.000000, min_disp=0.500000
Traceback (most recent call last):
  File "/cluster/home/mspeir/ENV_cellbrowser/bin/cbScanpy", line 11, in <module>
    sys.exit(cbScanpyCli())
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 3630, in cbScanpyCli
    adata = cbScanpy(matrixFname, confFname, figDir, logFname, matrixOutFname)
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 3443, in cbScanpy
    filter_result = sc.pp.filter_genes_dispersion(adata.X, min_mean=minMean, max_mean=maxMean, min_disp=minDisp)
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/scanpy/preprocessing/_deprecated/highly_variable_genes.py", line 131, in filter_genes_dispersion
    gen_indices = np.where(one_gene_per_bin[df['mean_bin']])[0].tolist()
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/pandas/core/series.py", line 911, in __getitem__
    return self._get_with(key)
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/pandas/core/series.py", line 953, in _get_with
    return self.reindex(key)
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/pandas/core/series.py", line 3734, in reindex
    return super(Series, self).reindex(index=index, **kwargs)
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/pandas/core/generic.py", line 4346, in reindex
    fill_value, copy).__finalize__(self)
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/pandas/core/generic.py", line 4359, in _reindex_axes
    tolerance=tolerance, method=method)
  File "/cluster/home/mspeir/ENV_cellbrowser/lib/python3.6/site-packages/pandas/core/indexes/category.py", line 503, in reindex
    raise ValueError("cannot reindex with a non-unique indexer")
ValueError: cannot reindex with a non-unique indexer

This is using the 'Raw Counts Matrix - Cord Blood' h5 file is from 'Census of Immune Cells' data set here: https://preview.data.humancellatlas.org/.

maximilianh commented 5 years ago

Oh darn, I spent quite a while trying to track this down, then had lunch and only then had the idea of googling it. It's a known problem of your version combination.

https://github.com/theislab/scanpy/issues/450

It's working on my machine and I just saw that my version of pandas is pandas==0.22.0.

This problem has just been fixed in Scanpy. So there are at least two options for you:

maximilianh commented 5 years ago

Closing this as it's another scanpy problem.

cotedivoir commented 4 years ago

got the same issue, tried to downgrade pandas as you suggested, still there:

cbScanpy -e filtered_gene_bc_matrices/hg19/matrix.mtx -o scanpyOut -n pbmc3k INFO:root:Loading Scanpy libraries INFO:numexpr.utils:NumExpr defaulting to 8 threads. INFO:get_version:dirname: Trying to get version of get_version from dirname /home/cotedivoir/.local/lib/python3.6/site-packages INFO:getversion:dirname: Failed; Does not match re.compile('get[-]version-([\d.]+?)(?:\.dev(\d+))?(?:_+-)?$') INFO:get_version:git: Trying to get version from git in directory /home/cotedivoir/.local/lib/python3.6/site-packages INFO:get_version:git: Failed; The top-level directory of the current Git repository is not the same as the root directory of the distribution INFO:get_version:metadata: Trying to get version for get_version in dir /home/cotedivoir/.local/lib/python3.6/site-packages INFO:get_version:metadata: Succeeded INFO:get_version:dirname: Trying to get version of legacy_api_wrap from dirname /home/cotedivoir/.local/lib/python3.6/site-packages INFO:getversion:dirname: Failed; Does not match re.compile('legacy[-]api[-]wrap-([\d.]+?)(?:\.dev(\d+))?(?:[+-]([0-9a-zA-Z.]+))?$') INFO:get_version:git: Trying to get version from git in directory /home/cotedivoir/.local/lib/python3.6/site-packages INFO:get_version:git: Failed; The top-level directory of the current Git repository is not the same as the root directory of the distribution INFO:get_version:metadata: Trying to get version for legacy_api_wrap in dir /home/cotedivoir/.local/lib/python3.6/site-packages INFO:get_version:metadata: Succeeded INFO:root:cbScanpy $Id$ INFO:root:Input file: filtered_gene_bc_matrices/hg19/matrix.mtx INFO:root:Restricting OPENBLAS to 4 threads INFO:root:Start time: 2020-01-28 16:40:36.219846 scanpy==1.4.5.post2 anndata==0.7.1 umap==0.3.10 numpy==1.18.1 scipy==1.4.1 pandas==0.22.0 scikit-learn==0.22.1 statsmodels==0.11.0 INFO:root:Loading expression matrix: mtx format INFO:root:Data has 2700 samples/observations INFO:root:Data has 32738 genes/variables INFO:root:Basic filtering: keep only cells with min 200 genes Variable names are not unique. To make them unique, call .var_names_make_unique. Variable names are not unique. To make them unique, call .var_names_make_unique. INFO:root:Basic filtering: keep only gene with min 3 cells Variable names are not unique. To make them unique, call .var_names_make_unique. Variable names are not unique. To make them unique, call .var_names_make_unique. INFO:root:After filtering: Data has 2700 samples/observations and 13714 genes/variables INFO:root:'geneIdType' is not specified in config file or set to 'auto'. INFO:root:Auto-detected gene IDs type: symbols INFO:root:Remove cells with more than 0.050000 percent of mitochondrial genes INFO:root:Computing percentage of mitochondrial genes Traceback (most recent call last): File "/home/cotedivoir/.local/bin/cbScanpy", line 11, in sys.exit(cbScanpyCli()) File "/home/cotedivoir/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 5390, in cbScanpyCli adata, params = cbScanpy(matrixFname, metaFname, inCluster, confFname, figDir, logFname) File "/home/cotedivoir/.local/lib/python3.6/site-packages/cellbrowser/cellbrowser.py", line 5040, in cbScanpy adata.obs['percent_mito'] = np.sum(adata[:, mito_genes].X, axis=1) / np.sum(adata.X, axis=1) File "/home/cotedivoir/.local/lib/python3.6/site-packages/anndata/_core/anndata.py", line 1049, in getitem oidx, vidx = self._normalize_indices(index) File "/home/cotedivoir/.local/lib/python3.6/site-packages/anndata/_core/anndata.py", line 1030, in _normalize_indices return _normalize_indices(index, self.obs_names, self.var_names) File "/home/cotedivoir/.local/lib/python3.6/site-packages/anndata/_core/index.py", line 34, in _normalize_indices ax1 = _normalize_index(ax1, names1) File "/home/cotedivoir/.local/lib/python3.6/site-packages/anndata/_core/index.py", line 89, in _normalize_index positions = index.get_indexer(indexer) File "/home/cotedivoir/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2687, in get_indexer raise InvalidIndexError('Reindexing only valid with uniquely' pandas.core.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

maximilianh commented 4 years ago

It looks like this is scanpy==1.4.5.post2 anndata==0.7.1 umap==0.3.10 numpy==1.18.1 scipy==1.4.1 pandas==0.22.0 scikit-learn==0.22.1 statsmodels==0.11.0

How did you install sacnpy?

maximilianh commented 4 years ago

overall, it doesn't look like this is an issue with the cellbrowser, but rather with your scanpy installation?

how did you install the cellbrowser and which version?

cotedivoir commented 4 years ago

Thank you for the reply!

version: cellbrowser (0.7.7) how installed: pip3 install

maximilianh commented 4 years ago

Thanks! How did you install scanpy? With conda?

On Wed 29 Jan 2020 at 18:11, Anastasia notifications@github.com wrote:

Thank you for the reply!

version: cellbrowser (0.7.7) how installed: pip3 install

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/66?email_source=notifications&email_token=AACL4TIBQ6ZK7WO5SE27UYTRAG2DVA5CNFSM4GTEI3K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKH7YSY#issuecomment-579861579, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TN7HGM5SRIKNE5CKSTRAG2DVANCNFSM4GTEI3KQ .

cotedivoir commented 4 years ago

no, same way - pip3

Thanks! How did you install scanpy? With conda? On Wed 29 Jan 2020 at 18:11, Anastasia @.***> wrote: Thank you for the reply! version: cellbrowser (0.7.7) how installed: pip3 install — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#66?email_source=notifications&email_token=AACL4TIBQ6ZK7WO5SE27UYTRAG2DVA5CNFSM4GTEI3K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKH7YSY#issuecomment-579861579>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TN7HGM5SRIKNE5CKSTRAG2DVANCNFSM4GTEI3KQ .

maximilianh commented 4 years ago

Hm, I can't reproduce this... is this on OSX or Linux? If yes, which linux version?

This doesn't seem to be a cellbrowser problem, but something with your scanpy install... did you ask the scanpy people?

On Thu, Jan 30, 2020 at 12:36 AM Anastasia notifications@github.com wrote:

no, same way - pip3

Thanks! How did you install scanpy? With conda? … <#m1705389014784748423> On Wed 29 Jan 2020 at 18:11, Anastasia @.***> wrote: Thank you for the reply! version: cellbrowser (0.7.7) how installed: pip3 install — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#66 https://github.com/maximilianh/cellBrowser/issues/66?email_source=notifications&email_token=AACL4TIBQ6ZK7WO5SE27UYTRAG2DVA5CNFSM4GTEI3K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKH7YSY#issuecomment-579861579>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TN7HGM5SRIKNE5CKSTRAG2DVANCNFSM4GTEI3KQ .

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/66?email_source=notifications&email_token=AACL4TLTJDFD4NAQXDGOPJDRAIHGDA5CNFSM4GTEI3K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKJFBXQ#issuecomment-580014302, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TIA2BBEGCYHATKI2GDRAIHGDANCNFSM4GTEI3KQ .

cotedivoir commented 4 years ago

think i should have started with this: i use WSL for windows with Ubuntu distribution

maximilianh commented 4 years ago

Urgs. OK, I don't really want to dig into this. It's a problem with scanpy, partially due to them constantly making breaking changes. Can you ask them? I don't know about about scanpy. You can probably easily reproduce this problem by running a standard scanpy tutorial.

Can I ask why you're using the scanpy pipeline of cellbrowser? Do you have your own results or is this part of a term project and cbScanpy has a ready-made pipeline for single cell analysis?

On Fri, Jan 31, 2020 at 11:37 PM Anastasia notifications@github.com wrote:

think i should have started with this: i use WSL for windows with Ubuntu distribution

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/66?email_source=notifications&email_token=AACL4TI5ZLU4SQLK33WVBH3RASR3TA5CNFSM4GTEI3K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKQHPUI#issuecomment-580941777, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TO5SNSUNIKRXJH6TZ3RASR3TANCNFSM4GTEI3KQ .

cotedivoir commented 4 years ago

ok, sorry for bothering, i'm just trying to learn more about single cell data analysis. not much structured information around. so i found cellbrowser, and as you say cbScanpy has already prebuild pipeline so i decided that would be a way to start

maximilianh commented 4 years ago

It usually is but only if your scanpy works. The recommended way to install scanpy is conda, it has a ton of dependencies.

On Sun 2 Feb 2020 at 02:28, Anastasia notifications@github.com wrote:

ok, sorry for bothering, i'm just trying to learn more about single cell data analysis. not much structured information around. so i found cellbrowser, and as you say cbScanpy has already prebuild pipeline so i decided that would be a way to start

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/66?email_source=notifications&email_token=AACL4TIL3YXBFAUAFW4Z26DRAYOS5A5CNFSM4GTEI3K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKRLALI#issuecomment-581087277, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TMRGCHPOECMC7CPX5TRAYOS5ANCNFSM4GTEI3KQ .

maximilianh commented 4 years ago

Oh that would be the reason by the way: install scanpy with conda! I don’t think the pip way is recommended anymore. Try conda as let me know if it works then.

On Sun 2 Feb 2020 at 20:22, Maximilian Haeussler maximilianh@gmail.com wrote:

It usually is but only if your scanpy works. The recommended way to install scanpy is conda, it has a ton of dependencies.

On Sun 2 Feb 2020 at 02:28, Anastasia notifications@github.com wrote:

ok, sorry for bothering, i'm just trying to learn more about single cell data analysis. not much structured information around. so i found cellbrowser, and as you say cbScanpy has already prebuild pipeline so i decided that would be a way to start

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/maximilianh/cellBrowser/issues/66?email_source=notifications&email_token=AACL4TIL3YXBFAUAFW4Z26DRAYOS5A5CNFSM4GTEI3K2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEKRLALI#issuecomment-581087277, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACL4TMRGCHPOECMC7CPX5TRAYOS5ANCNFSM4GTEI3KQ .

cotedivoir commented 4 years ago

Hi! Installed scanpy via conda, still get the same error

matthewspeir commented 4 years ago

@ivirshup We think this might be an issue with scanpy (or maybe just their install of scanpy). Do you have insights into the issues this user is seeing?