theislab / scvelo

RNA Velocity generalized through dynamical modeling
https://scvelo.org
BSD 3-Clause "New" or "Revised" License
414 stars 102 forks source link

Error when importing scvelo in notebook #46

Closed PattF closed 5 years ago

PattF commented 5 years ago

Hi, I'm getting the following error when trying to initially import scvelo through import scvelo as scv (currently running scvelo 0.1.16, scanpy 1.4, numpy 1.15.4). Any help would be greatly appreciated, thanks!


TypeError Traceback (most recent call last)

in ----> 1 import scvelo as scv 2 import scanpy as sc 3 scv.logging.print_versions() ~\Anaconda3\lib\site-packages\scvelo\__init__.py in 1 """scvelo - stochastic single cell RNA velocity""" 2 ----> 3 from .get_version import get_version 4 __version__ = get_version(__file__) 5 del get_version ~\Anaconda3\lib\site-packages\scvelo\get_version.py in 143 144 --> 145 __version__ = get_version(__file__) 146 147 ~\Anaconda3\lib\site-packages\scvelo\get_version.py in get_version(package) 137 return str( 138 get_version_from_dirname(name, parent) --> 139 or get_version_from_git(parent) 140 or get_version_from_metadata(name, parent) 141 or "0.0.0" ~\Anaconda3\lib\site-packages\scvelo\get_version.py in get_version_from_git(parent) 55 try: 56 p = run(["git", "rev-parse", "--show-toplevel"], ---> 57 cwd=parent, stdout=PIPE, stderr=PIPE, encoding="utf-8", check=True) 58 except (OSError, CalledProcessError): 59 return None ~\Anaconda3\lib\subprocess.py in run(input, timeout, check, *popenargs, **kwargs) 401 kwargs['stdin'] = PIPE 402 --> 403 with Popen(*popenargs, **kwargs) as process: 404 try: 405 stdout, stderr = process.communicate(input, timeout=timeout) ~\Anaconda3\lib\subprocess.py in __init__(self, args, bufsize, executable, stdin, stdout, stderr, preexec_fn, close_fds, shell, cwd, env, universal_newlines, startupinfo, creationflags, restore_signals, start_new_session, pass_fds, encoding, errors) 705 c2pread, c2pwrite, 706 errread, errwrite, --> 707 restore_signals, start_new_session) 708 except: 709 # Cleanup if the child failed starting. ~\Anaconda3\lib\subprocess.py in _execute_child(self, args, executable, preexec_fn, close_fds, pass_fds, cwd, env, startupinfo, creationflags, shell, p2cread, p2cwrite, c2pread, c2pwrite, errread, errwrite, unused_restore_signals, unused_start_new_session) 988 env, 989 cwd, --> 990 startupinfo) 991 finally: 992 # Child is launched. Close the parent's copy of those pipe TypeError: CreateProcess() argument 8 must be str or None, not WindowsPath
VolkerBergen commented 5 years ago

Are you using a conda environment? And have you installed via PyPI? For some reason, it tried to get the version from git..

PattF commented 5 years ago

Yep, using anaconda in windows, ran pip install scvelo and then that error came up when trying to import at the start of my notebook. Am using scanpy as well and am not encountering this error. Just uninstalled and reinstalled and same error again. Thoughts?

VolkerBergen commented 5 years ago

hmm.. could not reproduce that on any windows machine that's lying around here. Would you try it out on a new clean python >= 3.6 environment and see whether you get the same issue?

VolkerBergen commented 5 years ago

@stefanpeidli maybe you could take a look as well?

PattF commented 5 years ago

Unfortunately still getting the same issue, other possible work arounds?

stefanpeidli commented 5 years ago

We have updated that part of the code. Could you please install the new version of scvelo and try if it works now?

VolkerBergen commented 5 years ago

*from source (not yet on PyPI)

PattF commented 5 years ago

Great, thanks guys. I tried installing from source but now get the following error, thoughts?

pip install git+https://github.com/theislab/scvelo
Collecting git+https://github.com/theislab/scvelo
  Cloning https://github.com/theislab/scvelo to c:\users\patty\appdata\local\temp\pip-req-build-m90fcbqy
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
    Preparing wheel metadata ... error
    Complete output from command c:\users\patty\anaconda3\python.exe c:\users\patty\anaconda3\lib\site-packages\pip\_vendor\pep517\_in_process.py prepare_metadata_for_build_wheel C:\Users\Patty\AppData\Local\Temp\tmpevsyi6yd:
    The file description seems not to be valid rst for PyPI; it will be interpreted as plain text
    <string>:: (WARNING/2) No MathJax URL specified, using local fallback (see config.html)

    Traceback (most recent call last):
      File "c:\users\patty\anaconda3\lib\site-packages\pip\_vendor\pep517\_in_process.py", line 207, in <module>
        main()
      File "c:\users\patty\anaconda3\lib\site-packages\pip\_vendor\pep517\_in_process.py", line 197, in main
        json_out['return_val'] = hook(**hook_input['kwargs'])
      File "c:\users\patty\anaconda3\lib\site-packages\pip\_vendor\pep517\_in_process.py", line 69, in prepare_metadata_for_build_wheel
        return hook(metadata_directory, config_settings)
      File "C:\Users\Patty\AppData\Local\Temp\pip-build-env-30tqm2b8\overlay\Lib\site-packages\flit\buildapi.py", line 27, in prepare_metadata_for_build_wheel
        metadata = make_metadata(module, ini_info)
      File "C:\Users\Patty\AppData\Local\Temp\pip-build-env-30tqm2b8\overlay\Lib\site-packages\flit\common.py", line 302, in make_metadata
        md_dict.update(get_info_from_module(module))
      File "C:\Users\Patty\AppData\Local\Temp\pip-build-env-30tqm2b8\overlay\Lib\site-packages\flit\common.py", line 120, in get_info_from_module
        version = check_version(version)
      File "C:\Users\Patty\AppData\Local\Temp\pip-build-env-30tqm2b8\overlay\Lib\site-packages\flit\common.py", line 146, in check_version
        version = normalise_version(version)
      File "C:\Users\Patty\AppData\Local\Temp\pip-build-env-30tqm2b8\overlay\Lib\site-packages\flit\validate.py", line 325, in normalise_version
        .format(orig_version))
    flit.common.InvalidVersion: Version number "Version(release='0.1.16', dev='33', labels=['3f5da63'])" does not match PEP 440 rules

    ----------------------------------------
Command "c:\users\patty\anaconda3\python.exe c:\users\patty\anaconda3\lib\site-packages\pip\_vendor\pep517\_in_process.py prepare_metadata_for_build_wheel C:\Users\Patty\AppData\Local\Temp\tmpevsyi6yd" failed with error code 1 in C:\Users\Patty\AppData\Local\Temp\pip-req-build-m90fcbqy
VolkerBergen commented 5 years ago

Will check on that. For now, you could just

git clone https://github.com/theislab/scvelo.git
cd scvelo
python setup.py install
PattF commented 5 years ago

Oki, tried that and then unfortunately got the following error:


AttributeError Traceback (most recent call last)

in ----> 1 import scvelo as scv 2 import scanpy as sc 3 scv.logging.print_versions() ~\Anaconda3\lib\site-packages\scvelo-version_release_0.1.16_.dev_33_.labels_3f5da63_-py3.6.egg\scvelo\__init__.py in 2 3 from .get_version import get_version ----> 4 __version__ = get_version(__file__) 5 del get_version 6 ~\Anaconda3\lib\site-packages\scvelo-version_release_0.1.16_.dev_33_.labels_3f5da63_-py3.6.egg\scvelo\get_version.py in get_version(package) 138 get_version_from_dirname(name, parent) 139 or get_version_from_git(parent) --> 140 or get_version_from_metadata(name, parent) 141 or "0.0.0" 142 ) ~\Anaconda3\lib\site-packages\scvelo-version_release_0.1.16_.dev_33_.labels_3f5da63_-py3.6.egg\scvelo\get_version.py in get_version_from_metadata(name, parent) 102 return None 103 --> 104 return Version.parse(pkg.version) 105 106 AttributeError: type object 'Version' has no attribute 'parse'
VolkerBergen commented 5 years ago

We'll have a look into that. In the meanwhile, have you thought about moving to linux? We haven't done our implementation primarily for windows, thus lacking compatibility.

PattF commented 5 years ago

Thanks for that, and completely understandable. I sure have, and will try making the switch over as soon as I finish off my thesis and have more time :)

VolkerBergen commented 5 years ago

Just looked again into your issue. I could not find the python version you're using. Looks like you are running on root.

Create a new environment conda create -n py36 python=3.6, activate it via conda activate py36. install pytables with conda install pytables and scvelo with pip install scvelo and try again.

PattF commented 5 years ago

Hi Volker,

Sorry for the late reply, I'm away travelling at the moment. Thanks for following up, I'll attempt this over the weekend and let you know how it goes. Thanks!

PattF commented 5 years ago

Hey Volker, Creating the new environment seemed to work fine when installing pytables and scvelo, but when trying to run scvelo in my notebook I'm still running into the same issue. Is there anything different I need to do in my notebooks when running the new environment? Thanks! ps, sorry about the late response, was travelling in the US and only got back a few days ago.

VolkerBergen commented 5 years ago

Now I got back from holidays myself and can get back to your issue.

If it works fine in your console, but it does not run on your notebook, you might be running your notebook within the wrong environment (perhaps in root)?

You can check your environment and python version with

import sys, platform
print(sys.executable)
print(platform.python_version())
PattF commented 5 years ago

Hey Volker, Thanks for getting back to me, sorry for the delayed response, I checked the environment within the notebook as you suggested and got the following:

C:\Users\Patty\Anaconda3\python.exe
3.6.0

How do I ensure I'm not running in root? Or, how do I change from root to the new environment? Thanks!

PattF commented 5 years ago

Alright, so I think I've managed to run the notebook in the new environment as I'm currently getting the following output for environment and python version:

C:\Users\Patty\Anaconda3\envs\py36\python.exe
3.6.8

Scvelo seems to be finally importing fine now, great! But unfortunately am now running into the following error when trying to merge my preprocessed scanpy AnnData file with my loom file. Thoughts?

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-1689fa8fbbd0> in <module>
      1 ldata = scv.read(path2,
      2                  cache=True)
----> 3 adata = scv.utils.merge(adata, ldata)

~\Anaconda3\envs\py36\lib\site-packages\scvelo\read_load.py in merge(adata, ldata, copy)
    134     same_vars = (len(_adata.var_names) == len(_ldata.var_names) and np.all(_adata.var_names == _ldata.var_names))
    135     if len(common_vars) > 0 and not same_vars:
--> 136         _adata._inplace_subset_var(common_vars)
    137         _ldata._inplace_subset_var(common_vars)
    138 

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in _inplace_subset_var(self, index)
   1431         Same as ``adata = adata[:, index]``, but inplace.
   1432         """
-> 1433         adata_subset = self[:, index].copy()
   1434         self._init_as_actual(adata_subset, dtype=self._X.dtype)
   1435 

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in __getitem__(self, index)
   1297     def __getitem__(self, index: Index) -> 'AnnData':
   1298         """Returns a sliced view of the object."""
-> 1299         return self._getitem_view(index)
   1300 
   1301     def _getitem_view(self, index: Index) -> 'AnnData':

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in _getitem_view(self, index)
   1300 
   1301     def _getitem_view(self, index: Index) -> 'AnnData':
-> 1302         oidx, vidx = self._normalize_indices(index)
   1303         return AnnData(self, oidx=oidx, vidx=vidx, asview=True)
   1304 

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in _normalize_indices(self, index)
   1277         obs, var = super()._unpack_index(index)
   1278         obs = _normalize_index(obs, self.obs_names)
-> 1279         var = _normalize_index(var, self.var_names)
   1280         return obs, var
   1281 

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in _normalize_index(index, names)
    264         # incredibly faster one
    265         positions = pd.Series(index=names, data=range(len(names)))
--> 266         positions = positions[index]
    267         if positions.isnull().values.any():
    268             raise KeyError(

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\series.py in __getitem__(self, key)
    808             key = check_bool_indexer(self.index, key)
    809 
--> 810         return self._get_with(key)
    811 
    812     def _get_with(self, key):

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\series.py in _get_with(self, key)
    851                         return self.loc[key]
    852 
--> 853                     return self.reindex(key)
    854                 except Exception:
    855                     # [slice(0, 5, None)] will break if you convert to ndarray,

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\series.py in reindex(self, index, **kwargs)
   3323     @Appender(generic._shared_docs['reindex'] % _shared_doc_kwargs)
   3324     def reindex(self, index=None, **kwargs):
-> 3325         return super(Series, self).reindex(index=index, **kwargs)
   3326 
   3327     def drop(self, labels=None, axis=0, index=None, columns=None,

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\generic.py in reindex(self, *args, **kwargs)
   3687         # perform the reindex on the axes
   3688         return self._reindex_axes(axes, level, limit, tolerance, method,
-> 3689                                   fill_value, copy).__finalize__(self)
   3690 
   3691     def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value,

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   3705             obj = obj._reindex_with_indexers({axis: [new_index, indexer]},
   3706                                              fill_value=fill_value,
-> 3707                                              copy=copy, allow_dups=False)
   3708 
   3709         return obj

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
   3808                                                 fill_value=fill_value,
   3809                                                 allow_dups=allow_dups,
-> 3810                                                 copy=copy)
   3811 
   3812         if copy and new_data is self._data:

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\internals.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy)
   4412         # some axes don't allow reindexing with dups
   4413         if not allow_dups:
-> 4414             self.axes[axis]._can_reindex(indexer)
   4415 
   4416         if axis >= self.ndim:

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\indexes\base.py in _can_reindex(self, indexer)
   3574         # trying to reindex on an axis with duplicates
   3575         if not self.is_unique and len(indexer):
-> 3576             raise ValueError("cannot reindex from a duplicate axis")
   3577 
   3578     def reindex(self, target, method=None, level=None, limit=None,

ValueError: cannot reindex from a duplicate axis
VolkerBergen commented 5 years ago

Good to hear. Would you please run

print(adata, ldata)
print(adata.obs_names.intersection(ldata.obs_names))
print(adata.var_names.intersection(ldata.var_names))
PattF commented 5 years ago

Sure thing, here's the output:

AnnData object with n_obs × n_vars = 14731 × 18596 
    obs: 'sample', 'n_counts', 'log_counts', 'n_genes', 'mt_frac', 'size_factors', 'S_score', 'G2M_score', 'phase', 'louvain_r1', 'louvain_r0.5', 'Chor_marker_expr', 'Radg_marker_expr', 'Chr21_marker_expr'
    var: 'gene_id', 'n_cells', 'means', 'dispersions', 'dispersions_norm', 'highly_variable'
    uns: 'cluster_cell_type_matching', 'diffmap_evals', 'louvain', 'louvain_r0.5_colors', 'louvain_r0.5_sizes', 'neighbors', 'paga', 'pca', 'phase_colors', 'rank_genes_r0.5', 'sample_colors'
    obsm: 'X_pca', 'X_umap', 'X_diffmap'
    varm: 'PCs'
    layers: 'counts' AnnData object with n_obs × n_vars = 16971 × 58288 
    obs: 'Clusters', 'SampleID', 'SampleRef', '_X', '_Y'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced'
Index([], dtype='object', name='index')
Index(['FAM87B', 'LINC00115', 'FAM41C', 'SAMD11', 'NOC2L', 'KLHL17', 'PLEKHN1',
       'HES4', 'ISG15', 'AGRN',
       ...
       'MT-CO2', 'MT-ATP8', 'MT-ATP6', 'MT-CO3', 'MT-ND3', 'MT-ND4L', 'MT-ND4',
       'MT-ND5', 'MT-ND6', 'MT-CYB'],
      dtype='object', name='index', length=16229)
VolkerBergen commented 5 years ago

Looks like your observation names are not matching. Would need to examine your adata.obs_names, how to make them fit to ldata.obs_names.

We have an in-built module that cleans them up, i.e. scv.utils.clean_obs_names(adata), maybe that helps making them comparable.

Apart from that everything looks fine.

PattF commented 5 years ago

Hey Volker, Thanks for the fast reply. I tried the clean up module, but still came out with the same issue. Let me know what I can provide in order to figure this out. Thanks!

VolkerBergen commented 5 years ago

Print these to see whether they are matchable:

print(adata.obs_names, ldata.obs_names)

scv.utils.clean_obs_names(adata)
scv.utils.clean_obs_names(ldata)
print(adata.obs_names, ldata.obs_names)
PattF commented 5 years ago

Here we go:

Index(['AAACCTGAGATCACGG', 'AAACCTGAGATCCGAG', 'AAACCTGAGGAGCGTT',
       'AAACCTGAGGCACATG', 'AAACCTGAGTTTGCGT', 'AAACCTGCAAGCCGCT',
       'AAACCTGCACGAAATA', 'AAACCTGCACGCATCG', 'AAACCTGCACGGCTAC',
       'AAACCTGCACTAAGTC',
       ...
       'TTTGTCAAGATCCTGT', 'TTTGTCAAGCCAACAG', 'TTTGTCACACATTCGA',
       'TTTGTCACAGGTCGTC', 'TTTGTCAGTAAGTGTA', 'TTTGTCAGTAGCGCAA',
       'TTTGTCAGTGTCGCTG', 'TTTGTCATCGTAGATC', 'TTTGTCATCTGGTATG',
       'TTTGTCATCTTGCCGT'],
      dtype='object', name='index', length=14731) Index(['EU79_d45:AAAGTAGGTGTAACGGx', 'EU79_d45:AAACGGGTCTCGAGTAx',
       'EU79_d45:AAAGATGCAGTATAAGx', 'EU79_d45:AAAGCAAGTGCTTCTCx',
       'EU79_d45:AAACCTGAGATCCGAGx', 'EU79_d45:AAACGGGTCAAACGGGx',
       'EU79_d45:AAAGCAAAGTGGGCTAx', 'EU79_d45:AAAGTAGGTGTATGGGx',
       'EU79_d45:AAACCTGCACTAAGTCx', 'EU79_d45:AAAGATGTCGGACAAGx',
       ...
       'DS18_d140:TTTGGTTAGTACGATAx', 'DS18_d140:TTTGCGCCATACTACGx',
       'DS18_d140:TTTGCGCGTATAGTAGx', 'DS18_d140:TTTGGTTGTCTTCTCGx',
       'DS18_d140:TTTGTCATCGTAGATCx', 'DS18_d140:TTTGTCAGTCCAAGTTx',
       'DS18_d140:TTTGGTTCACGAAGCAx', 'DS18_d140:TTTGTCAGTGTCGCTGx',
       'DS18_d140:TTTGCGCGTGCTTCTCx', 'DS18_d140:TTTGTCATCGTCTGCTx'],
      dtype='object', name='index', length=16971)
Index(['AAACCTGAGATCACGG', 'AAACCTGAGATCCGAG', 'AAACCTGAGGAGCGTT',
       'AAACCTGAGGCACATG', 'AAACCTGAGTTTGCGT', 'AAACCTGCAAGCCGCT',
       'AAACCTGCACGAAATA', 'AAACCTGCACGCATCG', 'AAACCTGCACGGCTAC',
       'AAACCTGCACTAAGTC',
       ...
       'TTTGTCAAGATCCTGT', 'TTTGTCAAGCCAACAG', 'TTTGTCACACATTCGA',
       'TTTGTCACAGGTCGTC', 'TTTGTCAGTAAGTGTA', 'TTTGTCAGTAGCGCAA',
       'TTTGTCAGTGTCGCTG', 'TTTGTCATCGTAGATC', 'TTTGTCATCTGGTATG',
       'TTTGTCATCTTGCCGT'],
      dtype='object', length=14731) Index(['AAAGTAGGTGTAACGG', 'AAACGGGTCTCGAGTA', 'AAAGATGCAGTATAAG',
       'AAAGCAAGTGCTTCTC', 'AAACCTGAGATCCGAG', 'AAACGGGTCAAACGGG',
       'AAAGCAAAGTGGGCTA', 'AAAGTAGGTGTATGGG', 'AAACCTGCACTAAGTC',
       'AAAGATGTCGGACAAG',
       ...
       'TTTGGTTAGTACGATA', 'TTTGCGCCATACTACG', 'TTTGCGCGTATAGTAG',
       'TTTGGTTGTCTTCTCG', 'TTTGTCATCGTAGATC', 'TTTGTCAGTCCAAGTT',
       'TTTGGTTCACGAAGCA', 'TTTGTCAGTGTCGCTG', 'TTTGCGCGTGCTTCTC',
       'TTTGTCATCGTCTGCT'],
      dtype='object', length=16971)
VolkerBergen commented 5 years ago

Looks good after cleaning. Now check whether it finds common observation names for matching:

scv.utils.clean_obs_names(adata)
scv.utils.clean_obs_names(ldata)
print(adata.obs_names.intersection(ldata.obs_names))
PattF commented 5 years ago

Here we go:

Index(['AAACCTGAGATCACGG', 'AAACCTGAGATCCGAG', 'AAACCTGAGGAGCGTT',
       'AAACCTGAGGCACATG', 'AAACCTGAGTTTGCGT', 'AAACCTGCAAGCCGCT',
       'AAACCTGCACGAAATA', 'AAACCTGCACGCATCG', 'AAACCTGCACGGCTAC',
       'AAACCTGCACTAAGTC',
       ...
       'TTTGTCAAGATCCTGT', 'TTTGTCAAGCCAACAG', 'TTTGTCACACATTCGA',
       'TTTGTCACAGGTCGTC', 'TTTGTCAGTAAGTGTA', 'TTTGTCAGTAGCGCAA',
       'TTTGTCAGTGTCGCTG', 'TTTGTCATCGTAGATC', 'TTTGTCATCTGGTATG',
       'TTTGTCATCTTGCCGT'],
      dtype='object', length=14731)
VolkerBergen commented 5 years ago

Good, that works.

Now let's go line by line to see what is problematic:

scv.utils.clean_obs_names(adata)
scv.utils.clean_obs_names(ldata)

common_obs = adata.obs_names.intersection(ldata.obs_names)
common_vars = adata.var_names.intersection(ldata.var_names)

_adata = adata[common_obs].copy()
_ldata = ldata[common_obs].copy()

_adata._inplace_subset_var(common_vars)
_ldata._inplace_subset_var(common_vars)
PattF commented 5 years ago

Alright, this time getting an error:

Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-3fc6f6ea4b7e> in <module>
      9 
     10 _adata._inplace_subset_var(common_vars)
---> 11 _ldata._inplace_subset_var(common_vars)

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in _inplace_subset_var(self, index)
   1431         Same as ``adata = adata[:, index]``, but inplace.
   1432         """
-> 1433         adata_subset = self[:, index].copy()
   1434         self._init_as_actual(adata_subset, dtype=self._X.dtype)
   1435 

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in __getitem__(self, index)
   1297     def __getitem__(self, index: Index) -> 'AnnData':
   1298         """Returns a sliced view of the object."""
-> 1299         return self._getitem_view(index)
   1300 
   1301     def _getitem_view(self, index: Index) -> 'AnnData':

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in _getitem_view(self, index)
   1300 
   1301     def _getitem_view(self, index: Index) -> 'AnnData':
-> 1302         oidx, vidx = self._normalize_indices(index)
   1303         return AnnData(self, oidx=oidx, vidx=vidx, asview=True)
   1304 

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in _normalize_indices(self, index)
   1277         obs, var = super()._unpack_index(index)
   1278         obs = _normalize_index(obs, self.obs_names)
-> 1279         var = _normalize_index(var, self.var_names)
   1280         return obs, var
   1281 

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in _normalize_index(index, names)
    264         # incredibly faster one
    265         positions = pd.Series(index=names, data=range(len(names)))
--> 266         positions = positions[index]
    267         if positions.isnull().values.any():
    268             raise KeyError(

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\series.py in __getitem__(self, key)
    808             key = check_bool_indexer(self.index, key)
    809 
--> 810         return self._get_with(key)
    811 
    812     def _get_with(self, key):

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\series.py in _get_with(self, key)
    851                         return self.loc[key]
    852 
--> 853                     return self.reindex(key)
    854                 except Exception:
    855                     # [slice(0, 5, None)] will break if you convert to ndarray,

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\series.py in reindex(self, index, **kwargs)
   3323     @Appender(generic._shared_docs['reindex'] % _shared_doc_kwargs)
   3324     def reindex(self, index=None, **kwargs):
-> 3325         return super(Series, self).reindex(index=index, **kwargs)
   3326 
   3327     def drop(self, labels=None, axis=0, index=None, columns=None,

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\generic.py in reindex(self, *args, **kwargs)
   3687         # perform the reindex on the axes
   3688         return self._reindex_axes(axes, level, limit, tolerance, method,
-> 3689                                   fill_value, copy).__finalize__(self)
   3690 
   3691     def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value,

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   3705             obj = obj._reindex_with_indexers({axis: [new_index, indexer]},
   3706                                              fill_value=fill_value,
-> 3707                                              copy=copy, allow_dups=False)
   3708 
   3709         return obj

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
   3808                                                 fill_value=fill_value,
   3809                                                 allow_dups=allow_dups,
-> 3810                                                 copy=copy)
   3811 
   3812         if copy and new_data is self._data:

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\internals.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy)
   4412         # some axes don't allow reindexing with dups
   4413         if not allow_dups:
-> 4414             self.axes[axis]._can_reindex(indexer)
   4415 
   4416         if axis >= self.ndim:

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\indexes\base.py in _can_reindex(self, indexer)
   3574         # trying to reindex on an axis with duplicates
   3575         if not self.is_unique and len(indexer):
-> 3576             raise ValueError("cannot reindex from a duplicate axis")
   3577 
   3578     def reindex(self, target, method=None, level=None, limit=None,

ValueError: cannot reindex from a duplicate axis
VolkerBergen commented 5 years ago

Now change it to:

import pandas as pd

scv.utils.clean_obs_names(adata)
scv.utils.clean_obs_names(ldata)

adata.obs_names_make_unique()
ldata.obs_names_make_unique()

adata.var_names_make_unique()
ldata.var_names_make_unique()

common_obs = adata.obs_names.intersection(ldata.obs_names)
common_vars = pd.unique(adata.var_names.intersection(ldata.var_names))

_adata = adata[common_obs].copy()
_ldata = ldata[common_obs].copy()

_adata._inplace_subset_var(common_vars)
_ldata._inplace_subset_var(common_vars)
PattF commented 5 years ago

Similar error again:

Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
Observation names are not unique. To make them unique, call `.obs_names_make_unique`.
Variable names are not unique. To make them unique, call `.var_names_make_unique`.
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-11-1d4e9e32d746> in <module>
     17 
     18 _adata._inplace_subset_var(common_vars)
---> 19 _ldata._inplace_subset_var(common_vars)

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in _inplace_subset_var(self, index)
   1431         Same as ``adata = adata[:, index]``, but inplace.
   1432         """
-> 1433         adata_subset = self[:, index].copy()
   1434         self._init_as_actual(adata_subset, dtype=self._X.dtype)
   1435 

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in __getitem__(self, index)
   1297     def __getitem__(self, index: Index) -> 'AnnData':
   1298         """Returns a sliced view of the object."""
-> 1299         return self._getitem_view(index)
   1300 
   1301     def _getitem_view(self, index: Index) -> 'AnnData':

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in _getitem_view(self, index)
   1300 
   1301     def _getitem_view(self, index: Index) -> 'AnnData':
-> 1302         oidx, vidx = self._normalize_indices(index)
   1303         return AnnData(self, oidx=oidx, vidx=vidx, asview=True)
   1304 

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in _normalize_indices(self, index)
   1277         obs, var = super()._unpack_index(index)
   1278         obs = _normalize_index(obs, self.obs_names)
-> 1279         var = _normalize_index(var, self.var_names)
   1280         return obs, var
   1281 

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in _normalize_index(index, names)
    264         # incredibly faster one
    265         positions = pd.Series(index=names, data=range(len(names)))
--> 266         positions = positions[index]
    267         if positions.isnull().values.any():
    268             raise KeyError(

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\series.py in __getitem__(self, key)
    808             key = check_bool_indexer(self.index, key)
    809 
--> 810         return self._get_with(key)
    811 
    812     def _get_with(self, key):

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\series.py in _get_with(self, key)
    851                         return self.loc[key]
    852 
--> 853                     return self.reindex(key)
    854                 except Exception:
    855                     # [slice(0, 5, None)] will break if you convert to ndarray,

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\series.py in reindex(self, index, **kwargs)
   3323     @Appender(generic._shared_docs['reindex'] % _shared_doc_kwargs)
   3324     def reindex(self, index=None, **kwargs):
-> 3325         return super(Series, self).reindex(index=index, **kwargs)
   3326 
   3327     def drop(self, labels=None, axis=0, index=None, columns=None,

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\generic.py in reindex(self, *args, **kwargs)
   3687         # perform the reindex on the axes
   3688         return self._reindex_axes(axes, level, limit, tolerance, method,
-> 3689                                   fill_value, copy).__finalize__(self)
   3690 
   3691     def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value,

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   3705             obj = obj._reindex_with_indexers({axis: [new_index, indexer]},
   3706                                              fill_value=fill_value,
-> 3707                                              copy=copy, allow_dups=False)
   3708 
   3709         return obj

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
   3808                                                 fill_value=fill_value,
   3809                                                 allow_dups=allow_dups,
-> 3810                                                 copy=copy)
   3811 
   3812         if copy and new_data is self._data:

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\internals.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy)
   4412         # some axes don't allow reindexing with dups
   4413         if not allow_dups:
-> 4414             self.axes[axis]._can_reindex(indexer)
   4415 
   4416         if axis >= self.ndim:

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\indexes\base.py in _can_reindex(self, indexer)
   3574         # trying to reindex on an axis with duplicates
   3575         if not self.is_unique and len(indexer):
-> 3576             raise ValueError("cannot reindex from a duplicate axis")
   3577 
   3578     def reindex(self, target, method=None, level=None, limit=None,

ValueError: cannot reindex from a duplicate axis
VolkerBergen commented 5 years ago

Can you subset _ldata[:, common_vars]?

What's your pandas version ? import pandas as pd; print(pd.__version__). Maybe upgrading to the latest version helps?

PattF commented 5 years ago

Currently running pandas 0.23.4 As for the subsetting, should I just run the line you wrote out?

VolkerBergen commented 5 years ago

Yes, you can ugprade pandas (latest version: 0.24.2) and see whether you can subset.

PattF commented 5 years ago

Alright, so updated pandas to 0.24.2 and restarted the notebook. After attempting to subset, I got the same error as previous:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-baca2ec35139> in <module>
----> 1 _ldata[:, common_vars]

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in __getitem__(self, index)
   1297     def __getitem__(self, index: Index) -> 'AnnData':
   1298         """Returns a sliced view of the object."""
-> 1299         return self._getitem_view(index)
   1300 
   1301     def _getitem_view(self, index: Index) -> 'AnnData':

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in _getitem_view(self, index)
   1300 
   1301     def _getitem_view(self, index: Index) -> 'AnnData':
-> 1302         oidx, vidx = self._normalize_indices(index)
   1303         return AnnData(self, oidx=oidx, vidx=vidx, asview=True)
   1304 

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in _normalize_indices(self, index)
   1277         obs, var = super()._unpack_index(index)
   1278         obs = _normalize_index(obs, self.obs_names)
-> 1279         var = _normalize_index(var, self.var_names)
   1280         return obs, var
   1281 

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in _normalize_index(index, names)
    264         # incredibly faster one
    265         positions = pd.Series(index=names, data=range(len(names)))
--> 266         positions = positions[index]
    267         if positions.isnull().values.any():
    268             raise KeyError(

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    909             key = check_bool_indexer(self.index, key)
    910 
--> 911         return self._get_with(key)
    912 
    913     def _get_with(self, key):

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\series.py in _get_with(self, key)
    951                 return self.loc[key]
    952 
--> 953             return self.reindex(key)
    954         except Exception:
    955             # [slice(0, 5, None)] will break if you convert to ndarray,

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\series.py in reindex(self, index, **kwargs)
   3736     @Appender(generic.NDFrame.reindex.__doc__)
   3737     def reindex(self, index=None, **kwargs):
-> 3738         return super(Series, self).reindex(index=index, **kwargs)
   3739 
   3740     def drop(self, labels=None, axis=0, index=None, columns=None,

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\generic.py in reindex(self, *args, **kwargs)
   4354         # perform the reindex on the axes
   4355         return self._reindex_axes(axes, level, limit, tolerance, method,
-> 4356                                   fill_value, copy).__finalize__(self)
   4357 
   4358     def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value,

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   4372             obj = obj._reindex_with_indexers({axis: [new_index, indexer]},
   4373                                              fill_value=fill_value,
-> 4374                                              copy=copy, allow_dups=False)
   4375 
   4376         return obj

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
   4488                                                 fill_value=fill_value,
   4489                                                 allow_dups=allow_dups,
-> 4490                                                 copy=copy)
   4491 
   4492         if copy and new_data is self._data:

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\internals\managers.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy)
   1222         # some axes don't allow reindexing with dups
   1223         if not allow_dups:
-> 1224             self.axes[axis]._can_reindex(indexer)
   1225 
   1226         if axis >= self.ndim:

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\indexes\base.py in _can_reindex(self, indexer)
   3085         # trying to reindex on an axis with duplicates
   3086         if not self.is_unique and len(indexer):
-> 3087             raise ValueError("cannot reindex from a duplicate axis")
   3088 
   3089     def reindex(self, target, method=None, level=None, limit=None,

ValueError: cannot reindex from a duplicate axis
VolkerBergen commented 5 years ago

Strange.. Would you also upgrade anndata to 0.6.20?

Does that run through without giving an error? _ldata[:, _ldata.var_names[:, 5]]

PattF commented 5 years ago

Upgraded anndata to 0.6.20 and got the following error after running that line:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-14-dfcfc175223e> in <module>
----> 1 _ldata[:, _ldata.var_names[:, 5]]

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\indexes\base.py in __getitem__(self, key)
   3967 
   3968         key = com.values_from_object(key)
-> 3969         result = getitem(key)
   3970         if not is_scalar(result):
   3971             return promote(result)

IndexError: too many indices for array
VolkerBergen commented 5 years ago

Ah, I meant _ldata[:, _ldata.var_names[:5]]

PattF commented 5 years ago

Ha, sadly its a similar result to the previous errors we've seen:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-9f4708acfdc9> in <module>
----> 1 _ldata[:, _ldata.var_names[:5]]

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in __getitem__(self, index)
   1320     def __getitem__(self, index: Index) -> 'AnnData':
   1321         """Returns a sliced view of the object."""
-> 1322         return self._getitem_view(index)
   1323 
   1324     def _getitem_view(self, index: Index) -> 'AnnData':

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in _getitem_view(self, index)
   1323 
   1324     def _getitem_view(self, index: Index) -> 'AnnData':
-> 1325         oidx, vidx = self._normalize_indices(index)
   1326         return AnnData(self, oidx=oidx, vidx=vidx, asview=True)
   1327 

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in _normalize_indices(self, index)
   1300         obs, var = unpack_index(index)
   1301         obs = _normalize_index(obs, self.obs_names)
-> 1302         var = _normalize_index(var, self.var_names)
   1303         return obs, var
   1304 

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in _normalize_index(index, names)
    263         # incredibly faster one
    264         positions = pd.Series(index=names, data=range(len(names)))
--> 265         positions = positions[index]
    266         if positions.isnull().values.any():
    267             not_found = positions.index[positions.isnull().values]

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    909             key = check_bool_indexer(self.index, key)
    910 
--> 911         return self._get_with(key)
    912 
    913     def _get_with(self, key):

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\series.py in _get_with(self, key)
    951                 return self.loc[key]
    952 
--> 953             return self.reindex(key)
    954         except Exception:
    955             # [slice(0, 5, None)] will break if you convert to ndarray,

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\series.py in reindex(self, index, **kwargs)
   3736     @Appender(generic.NDFrame.reindex.__doc__)
   3737     def reindex(self, index=None, **kwargs):
-> 3738         return super(Series, self).reindex(index=index, **kwargs)
   3739 
   3740     def drop(self, labels=None, axis=0, index=None, columns=None,

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\generic.py in reindex(self, *args, **kwargs)
   4354         # perform the reindex on the axes
   4355         return self._reindex_axes(axes, level, limit, tolerance, method,
-> 4356                                   fill_value, copy).__finalize__(self)
   4357 
   4358     def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value,

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   4372             obj = obj._reindex_with_indexers({axis: [new_index, indexer]},
   4373                                              fill_value=fill_value,
-> 4374                                              copy=copy, allow_dups=False)
   4375 
   4376         return obj

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
   4488                                                 fill_value=fill_value,
   4489                                                 allow_dups=allow_dups,
-> 4490                                                 copy=copy)
   4491 
   4492         if copy and new_data is self._data:

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\internals\managers.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy)
   1222         # some axes don't allow reindexing with dups
   1223         if not allow_dups:
-> 1224             self.axes[axis]._can_reindex(indexer)
   1225 
   1226         if axis >= self.ndim:

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\indexes\base.py in _can_reindex(self, indexer)
   3085         # trying to reindex on an axis with duplicates
   3086         if not self.is_unique and len(indexer):
-> 3087             raise ValueError("cannot reindex from a duplicate axis")
   3088 
   3089     def reindex(self, target, method=None, level=None, limit=None,

ValueError: cannot reindex from a duplicate axis
VolkerBergen commented 5 years ago

Same applies to ldata[:, ldata.var_names[:5]] and adata[:, adata.var_names[:5]]? If that's so, this is a very fundamental problem.. still working on windows, right?

PattF commented 5 years ago

For ldata get the same error: ValueError: cannot reindex from a duplicate axis. For adata however I get an actual output: (and yes still on windows, though more and more considering making the switch to mac)

View of AnnData object with n_obs × n_vars = 14731 × 5 
    obs: 'sample', 'n_counts', 'log_counts', 'n_genes', 'mt_frac', 'size_factors', 'S_score', 'G2M_score', 'phase', 'louvain_r1', 'louvain_r0.5', 'Chor_marker_expr', 'Radg_marker_expr', 'Chr21_marker_expr'
    var: 'gene_id', 'n_cells', 'means', 'dispersions', 'dispersions_norm', 'highly_variable'
    uns: 'cluster_cell_type_matching', 'diffmap_evals', 'louvain', 'louvain_r0.5_colors', 'louvain_r0.5_sizes', 'neighbors', 'paga', 'pca', 'phase_colors', 'rank_genes_r0.5', 'sample_colors'
    obsm: 'X_pca', 'X_umap', 'X_diffmap'
    varm: 'PCs'
    layers: 'counts'
VolkerBergen commented 5 years ago

What if you first run scv.pp.filter_and_normalize(ldata, min_shared_counts=30) first, then try subsetting again.

PattF commented 5 years ago

Tried running that before subsetting but got the following error: (should it be min_counts?)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-a884d6b99cb4> in <module>
----> 1 scv.pp.filter_and_normalize(ldata, min_shared_counts=30)

TypeError: filter_and_normalize() got an unexpected keyword argument 'min_shared_counts'
VolkerBergen commented 5 years ago

min_shared_counts attribute is available in scvelo v0.1.17. Are you not running on the latest version? What is your scv.logging.print_versions()?

PattF commented 5 years ago

Sorry, thought I had run updating scvelo, onto v0.1.17 now. Versions used are: scvelo==0.1.17 scanpy==1.4 anndata==0.6.20 loompy==2.0.17 numpy==1.16.3 scipy==1.1.0 matplotlib==3.0.3 sklearn==0.20.3 pandas==0.24.2

After running the filter and normalize line, I got the following:

Filtered out 49211 genes that are detected in less than 30 counts (shared).
Normalized count data: X, spliced, unspliced.
Logarithmized X.

But then, after running _ldata[:, common_vars], back to the usual error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-13-baca2ec35139> in <module>
----> 1 _ldata[:, common_vars]

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in __getitem__(self, index)
   1320     def __getitem__(self, index: Index) -> 'AnnData':
   1321         """Returns a sliced view of the object."""
-> 1322         return self._getitem_view(index)
   1323 
   1324     def _getitem_view(self, index: Index) -> 'AnnData':

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in _getitem_view(self, index)
   1323 
   1324     def _getitem_view(self, index: Index) -> 'AnnData':
-> 1325         oidx, vidx = self._normalize_indices(index)
   1326         return AnnData(self, oidx=oidx, vidx=vidx, asview=True)
   1327 

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in _normalize_indices(self, index)
   1300         obs, var = unpack_index(index)
   1301         obs = _normalize_index(obs, self.obs_names)
-> 1302         var = _normalize_index(var, self.var_names)
   1303         return obs, var
   1304 

~\Anaconda3\envs\py36\lib\site-packages\anndata\base.py in _normalize_index(index, names)
    263         # incredibly faster one
    264         positions = pd.Series(index=names, data=range(len(names)))
--> 265         positions = positions[index]
    266         if positions.isnull().values.any():
    267             not_found = positions.index[positions.isnull().values]

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    909             key = check_bool_indexer(self.index, key)
    910 
--> 911         return self._get_with(key)
    912 
    913     def _get_with(self, key):

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\series.py in _get_with(self, key)
    951                 return self.loc[key]
    952 
--> 953             return self.reindex(key)
    954         except Exception:
    955             # [slice(0, 5, None)] will break if you convert to ndarray,

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\series.py in reindex(self, index, **kwargs)
   3736     @Appender(generic.NDFrame.reindex.__doc__)
   3737     def reindex(self, index=None, **kwargs):
-> 3738         return super(Series, self).reindex(index=index, **kwargs)
   3739 
   3740     def drop(self, labels=None, axis=0, index=None, columns=None,

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\generic.py in reindex(self, *args, **kwargs)
   4354         # perform the reindex on the axes
   4355         return self._reindex_axes(axes, level, limit, tolerance, method,
-> 4356                                   fill_value, copy).__finalize__(self)
   4357 
   4358     def _reindex_axes(self, axes, level, limit, tolerance, method, fill_value,

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\generic.py in _reindex_axes(self, axes, level, limit, tolerance, method, fill_value, copy)
   4372             obj = obj._reindex_with_indexers({axis: [new_index, indexer]},
   4373                                              fill_value=fill_value,
-> 4374                                              copy=copy, allow_dups=False)
   4375 
   4376         return obj

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\generic.py in _reindex_with_indexers(self, reindexers, fill_value, copy, allow_dups)
   4488                                                 fill_value=fill_value,
   4489                                                 allow_dups=allow_dups,
-> 4490                                                 copy=copy)
   4491 
   4492         if copy and new_data is self._data:

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\internals\managers.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy)
   1222         # some axes don't allow reindexing with dups
   1223         if not allow_dups:
-> 1224             self.axes[axis]._can_reindex(indexer)
   1225 
   1226         if axis >= self.ndim:

~\Anaconda3\envs\py36\lib\site-packages\pandas\core\indexes\base.py in _can_reindex(self, indexer)
   3085         # trying to reindex on an axis with duplicates
   3086         if not self.is_unique and len(indexer):
-> 3087             raise ValueError("cannot reindex from a duplicate axis")
   3088 
   3089     def reindex(self, target, method=None, level=None, limit=None,

ValueError: cannot reindex from a duplicate axis
VolkerBergen commented 5 years ago

and the same with ldata[:, ldata.var_names[:5]]?

@flying-sheep would be very grateful if you could have a quick look. What causes subsetting of an AnnData object to raise ValueError: cannot reindex from a duplicate axis

VolkerBergen commented 5 years ago

To make sure nothing is wrong with the cached data, re-load your ldata with ldata = scv.read(path2, cache=False) and try again subsetting.

PattF commented 5 years ago

After re-loading data, when running ldata[:, ldata.var_names[:5]] I'm finally getting an output:

View of AnnData object with n_obs × n_vars = 16971 × 5 
    obs: 'Clusters', 'SampleID', 'SampleRef', '_X', '_Y', 'sample_batch', 'initial_size_spliced', 'initial_size_unspliced', 'initial_size', 'n_counts'
    var: 'Accession', 'Chromosome', 'End', 'Start', 'Strand'
    layers: 'ambiguous', 'matrix', 'spliced', 'unspliced'
VolkerBergen commented 5 years ago

Apparently it was the cache. Now you can redo the merge.

PattF commented 5 years ago

Alright, so if I run scv.pp.filter_and_normalize(ldata, min_shared_counts=20) before attempting to merge adata and ldata it works great and I can progress, if however I don't run it, then I run into the same indexing error when attempting to merge.

Following your notebook examples, I've finally been able to generate velocity plots. I did however run into an error when attempting to run rank_velocity_genes similar to issue #64. I'm running scvelo 0.1.17 though had updated through pypi. Does it have to be through source to fix the error?

Happy to send my notebook in case you wanted to have a look I didn't stuff anything up along the way. Thanks for all the help so far!

VolkerBergen commented 5 years ago

Not sure, why filtering fixes this; prob related to anndata. Just released v0.1.18 (with rank_velocity_genes being stable).