numpy / numpy

The fundamental package for scientific computing with Python.
https://numpy.org
Other
27.82k stars 9.99k forks source link

Splitting np.cross into np.cross and np.cross2d? #13718

Open seibert opened 5 years ago

seibert commented 5 years ago

A Numba contributor has been working on adding support for np.cross to Numba (numba/numba#4128), and this raised the issue that np.cross has an unusual type signature. It feels like two related, but distinct functions have been glued together:

  1. A gufunc-like function that returns 3D vector cross products of 3D vector inputs, with the convenience feature to assume a zero z-component if the final dimension of (only) one of the input arrays has length 2. The number of dimensions of the output is equal to the number of dimensions on the inputs.

  2. A different gufunc-like function that returns the z component of the 3D vector cross product of 2D vector inputs. This is only selected if both inputs have length two in their last dimension. The number of dimensions on the output is one less than the number of dimensions of the inputs.

Aside from making the documentation of this function confusing (it took me several tries to understand these two modes, assuming I'm not still confused), it also makes it not possible to write a Numba type signature for this function because the number of returned dimensions depends on the exact length of the last dimension on each input. (Numba considers ndim, layout, and dtype part of the array type for purposes of code generation, but not the actual contents of shape.)

One could argue this is a Numba problem rather than a NumPy problem, but we suspect that attempts to construct a typing hinting system (like https://github.com/numpy/numpy-stubs) for NumPy functions that describes the relationship between input and output dimensions will also trip over np.cross. Our workaround in Numba will be only support mode #1 above, and create a separate numba.cross2d function that does mode #2, and raise compilations to direct users to the right one.

For similar reasons, adding a np.cross2d to NumPy might be a useful way to evolve toward an easier to understand np.cross and also make type hinting possible, although backward compatibility would require supporting both modes in np.cross for some time.

It is also totally fair to mark this as WONTFIX because this ship has already sailed. :)

WarrenWeckesser commented 2 years ago

Related: https://github.com/numpy/numpy/issues/13233

WarrenWeckesser commented 2 years ago

FYI: The awkwardness of the overloaded cross API is why I split these into two functions when I implemented them as gufuncs in ufunclab: cross2 has signature (2),(2)->(), and cross3 has signature (3),(3)->(3).

rgommers commented 2 years ago

This cross behavior is clearly a problem, for Numba, for @WarrenWeckesser's ufunc work, we came across it in the array API standardization work, and PyTorch/JAX et al will also have issues for the same reasons. It's probably worth removing the length-2 behavior from np.cross.

Separately, I think there's a bigger thing to learn here. This is probably one of quite a few pain points for Numba. Numba has a hard job keeping up with new NumPy releases (I've been seeing a few complaints/concerns about NumPy 1.22.x not yet being supported), and it's in the interest of both projects to make it easier for Numba to implement support for NumPy functionality.

We could for example make a roadmap item to prioritize and work on pain points in NumPy for Numba. The array API standard work helps a bit here (given a design principle is that everything should be JIT-compilable), however that only covers ~140 or so functions, which is ~10% of NumPy's API surface.

@seibert do you happen to have a wish list? Or is this scattered over many Numba and NumPy issues plus hacks in your code base?

rgommers commented 2 years ago

@seibert do you happen to have a wish list? Or is this scattered over many Numba and NumPy issues plus hacks in your code base?

xref https://github.com/numba/numba/issues/8008 for a detailed write-up from the Numba team. Also see the related mailing list message I just sent to make the next step here: https://mail.python.org/archives/list/numpy-discussion@python.org/thread/QL6BTNYZC3UXBUAWMCMO7KZJTDWBBPCO/

rmccampbell commented 2 months ago

I noticed that the #26640 was closed despite having a complete implementation, because there is a one-line equivalent. But I think this is a sufficiently common functionality to still warrant it's own function. The biggest benefit of a standalone function is that it's much easier to remember than the expansion (I can remember the right hand rule but I can never remember the order of the terms). There's also the matter of custom axes which is a simple keyword argument in the function without requiring complex indexing.