jorenham / scipy-stubs

Typing Stubs for SciPy
https://pypi.org/project/scipy-stubs/
BSD 3-Clause "New" or "Revised" License
14 stars 4 forks source link

`scipy.spatial` signature of `[c]KDTree.query` does not match docs #158

Closed jenshnielsen closed 3 days ago

jenshnielsen commented 4 days ago

According to the scipy docs query should return float, int or array of float, array of int

https://github.com/scipy/scipy/blob/main/scipy/spatial/_kdtree.py#L395

       Returns
        -------
        d : float or array of floats
            The distances to the nearest neighbors.
            If ``x`` has shape ``tuple+(self.m,)``, then ``d`` has shape
            ``tuple+(k,)``.
            When k == 1, the last dimension of the output is squeezed.
            Missing neighbors are indicated with infinite distances.
            Hits are sorted by distance (nearest first).

            .. versionchanged:: 1.9.0
               Previously if ``k=None``, then `d` was an object array of
               shape ``tuple``, containing lists of distances. This behavior
               has been removed, use `query_ball_point` instead.

        i : integer or array of integers
            The index of each neighbor in ``self.data``.
            ``i`` is the same shape as d.
            Missing neighbors are indicated with ``self.n``.

However, in scipy-stubs these are both typed as returning a tuple of two floats or tuple of arrays of floats

tuple[float, float] | tuple[npt.NDArray[np.float64], npt.NDArray[np.float64]]: ...

https://github.com/jorenham/scipy-stubs/blob/master/scipy-stubs/spatial/_ckdtree.pyi#L88

https://github.com/jorenham/scipy-stubs/blob/master/scipy-stubs/spatial/_kdtree.pyi#L73

jorenham commented 3 days ago

Hm yea, that's clearly wrong 🤔

It should be

tuple[float, np.int_] | tuple[npt.NDArray[np.float64], npt.NDArray[np.int_]]

right?

jenshnielsen commented 3 days ago

Looking at the Cython code in _ckdtree.pyx it seems that it is

tuple[float, int] | tuple[npt.NDArray[np.float64], npt.NDArray[np.intp]]

The return type is defined as

np.intp_t [:, ::1] ii = np.empty((n,len(k)),dtype=np.intp)
...
iiret = np.reshape(ii, retshape + (len(k),))
bool single = (x_arr.ndim == 1)
...
if single:
      ddret = float(ddret)
      iiret = int(iiret)
jorenham commented 3 days ago

Hmmm, when I ran it as tree.query([0, 2.2]) it returned

(0.20000000000000018, np.int64(0))

(I'm on a 64-bit linux platform, so intp is an alias for int64)

So maybe that has to do with the fact that builtins.int isn't fixed-width? Anyway, I think I'll just play it safe, and use int | np.int_ instead 🤷🏻. It shouldn't matter much in practice.


And you're right about the intp dtype: I forgot that int_ isn't the same as intp on numpy<2, which we still support.