scverse / scanpy

Single-cell analysis in Python. Scales to >1M cells.
https://scanpy.readthedocs.io
BSD 3-Clause "New" or "Revised" License
1.92k stars 602 forks source link

sc.tl.dpt with error: detected group with only [] cells #33

Closed zacharylau10 closed 7 years ago

zacharylau10 commented 7 years ago

Hello, it's me again, really thanks for your kindly reply before. when I analyze my own data using sc.tl.dpt with default n_branches, it worked well, but when I set n_branches more than 0, it occurred an error:


--> To enable computation of pseudotime, pass the index or expression vector
    of a root cell. Either add
        adata.add['iroot'] = root_cell_index
    or (robust to subsampling)
        adata.var['xroot'] = adata.X[root_cell_index, :]
    where "root_cell_index" is the integer index of the root cell, or
        adata.var['xroot'] = adata[root_cell_name, :].X
    where "root_cell_name" is the name (a string) of the root cell.
perform Diffusion Pseudotime analysis
    using "X_pca" for building graph
    using stored data graph with n_neighbors = 30 and spectrum
    [ 1.            0.9944264293  0.9934666753  0.9925051928  0.9899699688
      0.9893597364  0.9855745435  0.9840251803  0.981688261   0.9806631804]
    detect 1 branching
    do not consider groups with less than 2742 points for splitting
    branching 1: split group 0
WARNING: detected group with only [] cells

ValueError                                Traceback (most recent call last)
<ipython-input-3-b1749d943ac4> in <module>()
----> 1 get_ipython().run_cell_magic('time', '', 'sc.tl.dpt(adata_corrected,n_jobs=48,n_pcs=30,allow_kendall_tau_shift=False,n_branchings=1)\nsc.logging.print_memory_usage()')

/public/bioapps/ana/anaconda3/envs/python35/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2113             magic_arg_s = self.var_expand(line, stack_depth)
   2114             with self.builtin_trap:
-> 2115                 result = fn(magic_arg_s, cell)
   2116             return result
   2117 

<decorator-gen-59> in time(self, line, cell, local_ns)

/public/bioapps/ana/anaconda3/envs/python35/lib/python3.5/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    186     # but it's overkill for just that one bit of state.
    187     def magic_deco(arg):
--> 188         call = lambda f, *a, **k: f(*a, **k)
    189 
    190         if callable(arg):

/public/bioapps/ana/anaconda3/envs/python35/lib/python3.5/site-packages/IPython/core/magics/execution.py in time(self, line, cell, local_ns)
   1178         else:
   1179             st = clock2()
-> 1180             exec(code, glob, local_ns)
   1181             end = clock2()
   1182             out = None

<timed exec> in <module>()

/public/workspace/jiping/scanpy-master/scanpy/tools/dpt.py in dpt(adata, n_branchings, n_neighbors, knn, n_pcs, n_dcs, min_group_size, n_jobs, recompute_graph, recompute_pca, allow_kendall_tau_shift, flavor, copy)
    127         adata.smp['dpt_pseudotime'] = dpt.pseudotime
    128     # detect branchings and partition the data into segments
--> 129     dpt.branchings_segments()
    130     # vector of length n_groups
    131     adata.add['dpt_groups_order'] = [str(n) for n in dpt.segs_names_unique]

/public/workspace/jiping/scanpy-master/scanpy/tools/dpt.py in branchings_segments(self)
    188             for each segment.
    189         """
--> 190         self.detect_branchings()
    191         self.postprocess_segments()
    192         self.set_segs_names()

/public/workspace/jiping/scanpy-master/scanpy/tools/dpt.py in detect_branchings(self)
    258                                   segs_connects,
    259                                   segs_undecided,
--> 260                                   segs_adjacency, iseg, tips3)
    261         # store as class members
    262         self.segs = segs

/public/workspace/jiping/scanpy-master/scanpy/tools/dpt.py in detect_branching(self, segs, segs_tips, segs_connects, segs_undecided, segs_adjacency, iseg, tips3)
    464         # branching on the segment, return the list ssegs of segments that
    465         # are defined by splitting this segment
--> 466         result = self._detect_branching(Dseg, tips3, seg)
    467         ssegs, ssegs_tips, ssegs_adjacency, ssegs_connects, trunk = result
    468         # map back to global indices

/public/workspace/jiping/scanpy-master/scanpy/tools/dpt.py in _detect_branching(self, Dseg, tips, seg_reference)
    632             if len(np.flatnonzero(newseg)) <= 1:
    633                 logg.warn('detected group with only {} cells'.format(np.flatnonzero(newseg)))
--> 634             secondtip = newseg[np.argmax(Dseg[tips[inewseg]][newseg])]
    635             ssegs_tips.append([tips[inewseg], secondtip])
    636         undecided_cells = np.arange(Dseg.shape[0], dtype=int)[nonunique]

/public/bioapps/ana/anaconda3/envs/python35/lib/python3.5/site-packages/numpy/core/fromnumeric.py in argmax(a, axis, out)
    971     except AttributeError:
    972         return _wrapit(a, 'argmax', axis, out)
--> 973     return argmax(axis, out)
    974 
    975 

ValueError: attempt to get argmax of an empty sequence```

    Does it mean that this data didn't contain any branches? 
    Howerer, I do see some small tips when I plot using `diffmap` result or `sc.tl.dpt` with default branch. Could you help me to figure it out?

    Thanks,
    jiping
falexwolf commented 7 years ago

Hi Jiping!

I know that sometimes DPT detects groups with no cells in it; you can try setting the obscure option allow_kendall_tau_shift to False; sometimes this helps. But the problem goes deeper [see at the very end here and there how branching groups are sometimes not meaningfully chosen). We're almost done with a method that combines the merits of DPT with conventional clustering that resolves this problem.

No, it doesn't mean that there is no branching signature in your data; but it is certainly not a strong one; in many "easy" cases, DPT works perfectly. You could also try to make the branching more proncounced by changing the preprocessing.

Hope that helps, Alex

Btw: We started to set up a documentation at https://scanpy.readthedocs.io.

zacharylau10 commented 7 years ago

Hi Alex! Before, I filtered gene with min_mean and min_disp, and left about 1300 genes for downstream analysis. Maybe the dataset is highly similar, so I reduce the gene number and choose the top 200 highly variable genes and it run without error. Thanks a lot, Jiping

falexwolf commented 7 years ago

Great to hear! :smile: