mossjacob / pyslingshot

Python implementation of the Slingshot pseudotime algorithm
51 stars 7 forks source link

Performance and reliability #9

Open jeskowagner opened 2 years ago

jeskowagner commented 2 years ago

Hi!

This issue is carried over from the slingshot repo.

My question there had focused on whether pyslingshot was still under active development, thanks @mossjacob for the clarification on that it indeed is. Additionally, I raised two points:


Unfortunately I cannot share the tested data set at this stage. However, here are some bits to give you some intuition of the data.

PCA

The data set was center-scaled and dimensionality reduced with PCA and we selected the first 10 PCs using the "elbow" method. Clustering is done with Leiden on a kNN graph built on those PCs. I am aware that we are not really seeing any clear differentiation trajectory, however, I am developing general methods and am using this (imperfect) dataset as proof of concept. Colors in this plot represent the clusters. image

Running pyslingshot

My call to pyslingshot looks like this:

from slingshot import Slingshot
slingshot = Slingshot(data, cluster_labels_onehot, start_node=start_node)
slingshot.fit(num_epochs=1)

Prints 8 times the following:

UserWarning: 
The maximal number of iterations maxit (set to 20 by the program) allowed for finding a smoothing spline with fp=s has been reached: s too small.
There is an approximation returned but the corresponding weighted sum of squared residuals does not satisfy the condition abs(fp-s)/s < tol.

Before finally erroring out with:

----> 1 slingshot.fit(num_epochs=1)

File ~/.../slingshot/slingshot.py:166, in Slingshot.fit(self, num_epochs, debug_axes)
    163 self.debug_plot_avg = False
    165 if epoch == num_epochs - 1:  # plot curves
--> 166     self.plotter.clusters(self.debug_axes[1, 1], s=2, alpha=0.5)
    167     self.plotter.curves(self.debug_axes[1, 1], self.curves)

TypeError: 'NoneType' object is not subscriptable

Not sure why it's still trying to plot to the debug axes when I am not in debug mode. slingshot.debug_level is 0 in this example.

So I retried with the debug options on, following this tutorial:


fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10, 10))
custom_xlim = (-12, 12)
custom_ylim = (-12, 12)

slingshot = Slingshot(data, cluster_labels_onehot, start_node=start_node, debug_level='verbose')

slingshot.fit(num_epochs=1, debug_axes=axes)
Click for the resulting error message ```python /.../scipy/interpolate/_fitpack2.py:280: UserWarning: The maximal number of iterations maxit (set to 20 by the program) allowed for finding a smoothing spline with fp=s has been reached: s too small. There is an approximation returned but the corresponding weighted sum of squared residuals does not satisfy the condition abs(fp-s)/s < tol. warnings.warn(message) 0%| | 0/1 [00:31 7 slingshot.fit(num_epochs=1, debug_axes=axes) File ~/.../slingshot/slingshot.py:147, in Slingshot.fit(self, num_epochs, debug_axes) 144 self.calculate_cell_weights() 146 # Fit principal curve for all lineages using existing curves --> 147 self.fit_lineage_curves() 149 # Ensure starts at 0 150 for l_idx, lineage in enumerate(self.lineages): File ~/.../slingshot/slingshot.py:258, in Slingshot.fit_lineage_curves(self) 256 path_from = (self.data[i][0], curve.points_interp[i][0]) 257 path_to = (self.data[i][1], curve.points_interp[i][1]) --> 258 self.debug_axes[0, 1].plot(path_from, path_to, c='black', alpha=alphas[i]) 259 self.debug_axes[0, 1].plot(curve.points_interp[curve.order, 0], 260 curve.points_interp[curve.order, 1], label=str(lineage)) 262 d_sq, dist = curve.project_to_curve(self.data, curve.points_interp[curve.order]) File ~/.../matplotlib/axes/_axes.py:1662, in Axes.plot(self, scalex, scaley, data, *args, **kwargs) 1419 """ 1420 Plot y versus x as lines and/or markers. 1421 (...) 1659 (``'green'``) or hex strings (``'#008000'``). 1660 """ 1661 kwargs = cbook.normalize_kwargs(kwargs, mlines.Line2D) -> 1662 lines = [*self._get_lines(*args, data=data, **kwargs)] 1663 for line in lines: 1664 self.add_line(line) File ~/.../matplotlib/axes/_base.py:311, in _process_plot_var_args.__call__(self, data, *args, **kwargs) 309 this += args[0], 310 args = args[1:] --> 311 yield from self._plot_args( 312 this, kwargs, ambiguous_fmt_datakey=ambiguous_fmt_datakey) File ~/.../matplotlib/axes/_base.py:544, in _process_plot_var_args._plot_args(self, tup, kwargs, return_kwargs, ambiguous_fmt_datakey) 542 return list(result) 543 else: --> 544 return [l[0] for l in result] File ~/.../matplotlib/axes/_base.py:544, in (.0) 542 return list(result) 543 else: --> 544 return [l[0] for l in result] File ~/.../matplotlib/axes/_base.py:537, in (.0) 534 else: 535 labels = [label] * n_datasets --> 537 result = (make_artist(x[:, j % ncx], y[:, j % ncy], kw, 538 {**kwargs, 'label': label}) 539 for j, label in enumerate(labels)) 541 if return_kwargs: 542 return list(result) File ~/.../matplotlib/axes/_base.py:351, in _process_plot_var_args._makeline(self, x, y, kw, kwargs) 349 default_dict = self._getdefaults(set(), kw) 350 self._setdefaults(default_dict, kw) --> 351 seg = mlines.Line2D(x, y, **kw) 352 return seg, kw File ~/.../matplotlib/_api/deprecation.py:454, in make_keyword_only..wrapper(*args, **kwargs) 448 if len(args) > name_idx: 449 warn_deprecated( 450 since, message="Passing the %(name)s %(obj_type)s " 451 "positionally is deprecated since Matplotlib %(since)s; the " 452 "parameter will become keyword-only %(removal)s.", 453 name=name, obj_type=f"parameter of {func.__name__}()") --> 454 return func(*args, **kwargs) File ~/.../matplotlib/lines.py:393, in Line2D.__init__(self, xdata, ydata, linewidth, linestyle, color, gapcolor, marker, markersize, markeredgewidth, markeredgecolor, markerfacecolor, markerfacecoloralt, fillstyle, antialiased, dash_capstyle, solid_capstyle, dash_joinstyle, solid_joinstyle, pickradius, drawstyle, markevery, **kwargs) 389 self.set_markeredgewidth(markeredgewidth) 391 # update kwargs before updating data to give the caller a 392 # chance to init axes (and hence unit support) --> 393 self._internal_update(kwargs) 394 self._pickradius = pickradius 395 self.ind_offset = 0 File ~/.../matplotlib/artist.py:1186, in Artist._internal_update(self, kwargs) 1179 def _internal_update(self, kwargs): 1180 """ 1181 Update artist properties without prenormalizing them, but generating 1182 errors as if calling `set`. 1183 1184 The lack of prenormalization is to maintain backcompatibility. 1185 """ -> 1186 return self._update_props( 1187 kwargs, "{cls.__name__}.set() got an unexpected keyword argument " 1188 "{prop_name!r}") File ~/.../matplotlib/artist.py:1162, in Artist._update_props(self, props, errfmt) 1159 if not callable(func): 1160 raise AttributeError( 1161 errfmt.format(cls=type(self), prop_name=k)) -> 1162 ret.append(func(v)) 1163 if ret: 1164 self.pchanged() File ~/.../matplotlib/artist.py:983, in Artist.set_alpha(self, alpha) 980 raise TypeError( 981 f'alpha must be numeric or None, not {type(alpha)}') 982 if alpha is not None and not (0 <= alpha <= 1): --> 983 raise ValueError(f'alpha ({alpha}) is outside 0-1 range') 984 self._alpha = alpha 985 self.pchanged() ValueError: alpha (nan) is outside 0-1 range ```

Through plt.show() I can see that two of the four debug plots still ran through: image

I can see that this may be caused by an erroneous linage. See the first entry of the following output

[np.isnan(slingshot.curves[i].pseudotimes_interp).any() for i in range(len(slingshot.curves))]
# [True, False, False, False, False, False, False, False]

The corresponding lineage is: Lineage[20, 3, 10, 14, 18]


Long story short: I cannot get pyslingshot to run to completion in my hands, probably caused by this odd lineage there. When I export this exact data, run it through R's slingshot with approx_points set to default it finishes successfuly, and quicker than pyslingshot takes to get to this error.

This account is in no way meant to be discouraging or aggresive, and I am sorry if this wall of text conveys frustration. I am trying to lay out my struggle, but want to emphasise that I appreciate your development of pyslingshot. Also, I realise that I may have a very specific data set and usage here, which may well not fit into the scope of pyslingshot. I am content wrapping R's slingshot if necessary and in fact do so at the moment. But if there was a way to move past these issues and use approx_points, I would definitely move to pyslingshot to save the roundtrip to R.

Thanks! Best, Jesko

mossjacob commented 2 years ago

Dear Jesko,

Thank you so much for going into such detail! It will greatly help me when debugging this. I'll get back to you once I've done some investigating.

Best wishes, Jacob