numba / numba

NumPy aware dynamic Python compiler using LLVM
https://numba.pydata.org/
BSD 2-Clause "Simplified" License
9.98k stars 1.13k forks source link

meta-issue: All NumPy functions with no implementation #4074

Open stuartarchibald opened 5 years ago

stuartarchibald commented 5 years ago

Below are lists of feasible but not yet implemented NumPy functions... ideal for first time contributors. If you would like to contribute a function, please first check the open "Pull Requests" to make sure someone else isn't already working on the one you chose! If the check-box next to the function is checked it means the function is already done...

As a new contributor, some instructions for getting set up for development are here and are a great place to start.

It should be possible to write these functions using the numba.extending.overload functionality, they are of varying difficulty. A guide to using @overload is here:

It is also seemingly possible to write these functions with numba.extending.overload but they are harder/it is less easy to determined difficultly in implementing these:

These are likely to need lower level work:

godaygo commented 5 years ago

Great that you've prepared this list, it still seems rather long :) I'm not sure that each of the functions in this list is worthy of attention, and I think some of them need to be weeded out. For example all poly* functions are somewhat deprecated and are kept only for backward compatibility. For future code it is encouraged to use numpy.polynomial package: Polynomials. On the other hand, at the present time, it is much easier to implement support for functions in Numba than for classes from the Polynomial package.

I also had a feeling that the majority of numpy core developers would be happy to see financially-related functions outside of numpy. Perhaps this is my very subjective opinion.

seibert commented 5 years ago

Yes, this is a good point. By definition, this list will mostly have rarely used NumPy functions otherwise they would have likely been contributed already. Before picking one up, it is worth doing some web searching to see if people use it.

stuartarchibald commented 5 years ago

@godaygo thanks for your feedback. Indeed some of these functions are less critical than others. This list is written in part for the purposes of encouraging users to have a go at contributing functions to Numba and are a good gateway to being able to help out with more complicated issues.

luk-f-a commented 5 years ago

As someone currently on extending a Numpy function for Numba (my second), I can say that it is a great exercise to learn more about how Numba works. I encourage everyone wanting to understand Numba, anyone looking to be a power user or dev, to try to implement a function or two. On the other hand, when I look at the list above I wonder about the amount of work required to create a second version of something that Numpy already has. The speed-up provided by Numba is amazing, and if we want these functions in jitted code this work is necessary, however, in the big picture this is still a duplicated effort, even before considering the possible maintenance burden. I wonder: How could this effort be connected to or leveraged from uarray and XND? Could there be one day a Num(ba)Py? A Numpy version where all the functions are jitted? Or equivalently, could Numpy accept contributions in which people replace Python versions of the functions with jitted ones? I am guessing the devs have thought about this already, and it might be part of the long term roadmap (item called "integration of Numba into core PyData packages" maybe?).

seibert commented 5 years ago

Broadly, it is something that would be nice to figure out, but it is a hard problem since NumPy's implementations were never designed to be consumed by something like Numba. They are a mixture of Python code (frequently written in a style that Numba has difficulty statically typing) and private C code (which Numba should not call since they aren't public, stable APIs).

Numba has the same problem with CPython, where we have to basically duplicate functionality because there is no C API to the thing we need to access (like Unicode character tables, or random number generators). This is the pragmatic cost of making Numba work today, but it is not ideal.

Understandably, most projects would view providing direct C access to these various algorithms as being out of scope for their project. I'm not sure what can be done here, but it is something we think about. A few years ago at SciPy, there was discussion of whether more low-level NumPy operations could be implemented in a way that could be consumed by JITs (like Numba and PyPy), but no progress has been made since then, AFAIK.

guilhermeleobas commented 5 years ago

It looks like np.bincount was already implemented in a previous PR but it is still in the list.

stuartarchibald commented 5 years ago

@guilhermeleobas thanks, done

guilhermeleobas commented 5 years ago

Hi @stuartarchibald, can you cross np.count_nonzero from the list? It was implemented in this PR.

guilhermeleobas commented 5 years ago

Hi @stuartarchibald,

I do not think one needs to write code for numpy constants. They work just fine on jitted functions:

from numba import njit
import numpy as np

@njit
def foo():
    return (np.pi, np.infty, np.euler_gamma)

foo()
stuartarchibald commented 5 years ago

Thanks @guilhermeleobas, updated.

esc commented 4 years ago

Adding np.logspace and np.geomspace as per https://github.com/numba/numba/issues/6301

holymonson commented 4 years ago

@esc Since automated broadcast hasn't been implemented, could you add numpy.broadcast_to and/or numpy.broadcast_arrays for manually broadcast? It seems could be implemented by numpy.stride_tricks.as_strided.

esc commented 4 years ago

@holymonson sure, done.

caljrobe commented 3 years ago

@esc I would be interested in implementing np.resize to be supported. I apologize if I have overlooked it, but I cannot seem to find this "builder" class that is referenced, which I assume contains all the functions that should be used to optimize the performance of np.resize calls. Where might I find the corresponding source code/documentation for the builder? Also I welcome any other tips you have before I get started!

esc commented 3 years ago

@caljrobe thanks for asking about this. I am not sure where the "builder" class might be referenced, but this usually refers to:

https://llvmlite.readthedocs.io/en/latest/user-guide/ir/ir-builder.html

Also, for implementing NumPy functions, you'll probably want to look at:

https://numba.pydata.org/numba-doc/dev/extending/overloading-guide.html

Hope it helps.

guoqiangqi commented 3 years ago

@esc It seems like the np.iscomplex,np.iscomplexobj,np.isneginf,np.isposinf,np.isreal,np.isrealobj,np.isscalar and np.isnat have already been implemented in #4610 and #5293, but the list has not updated yet.

esc commented 3 years ago

@guoqiangqi thank you very much for the pointer, it is appreciated! I have updated the list accordingly. (But please do double check that I selected the correct functions).

guoqiangqi commented 3 years ago

@guoqiangqi thank you very much for the pointer, it is appreciated! I have updated the list accordingly. (But please do double check that I selected the correct functions).

I have checked again, it`s all good 👍

FirefoxMetzger commented 3 years ago

I think np.squeeze should be moved into the "needs low-level work" category. It is impossible to determine the number of axes being squeezed from the input types alone and I am not aware of any method that can reshape to a ndim known at runtime only. Theoretically, it is possible to work out ndim at compile-time (array shapes should be known at compile-time?), but that would need https://github.com/numba/numba/issues/5339 and constexpr support. Maybe there is an unsafe reshape option that I am not aware of?

The other blocker is the changing return type depending on how many axes actually end up being squeezed. To explain this with a small example, dynamically squeezing the first axis of an array:

@numba.jit(nopython=True)
def squeeze(a):
    if a.shape[0] == 1:
        return np.asarray(a[0, ...])
    return a

results in a surprising compile error:

Can't unify return type from the following types: array(int64, 0d, C), array(int64, 1d, C)
Return of: IR name '$52return_value.6', type 'array(int64, 0d, C)', location: 
File "skbot/transform/_utils.py", line 74:
def poor_mans_squeeze(a:np.ndarray) -> np.ndarray:
    <source elided>
    if a.shape[0] == 1:
        return np.asarray(a[0, ...])
        ^
Return of: IR name '$56return_value.1', type 'array(int64, 1d, C)', location: 
File "skbot/transform/_utils.py", line 75:
def poor_mans_squeeze(a:np.ndarray) -> np.ndarray:
    <source elided>
        return np.asarray(a[0, ...])
    return a

I think it's surprising, because it's all just views into the same buffer, which shouldn't be different types just because their ndim is different (their still both views).

FirefoxMetzger commented 3 years ago

@stuartarchibald Could you tick off np.swapaxis? It seems to be part of the current numby codebase:

https://github.com/numba/numba/blob/7f72315f0bf672729b5a6eaf004aa9061e8bf0f5/numba/np/arrayobj.py#L5503-L5536

esc commented 3 years ago

@stuartarchibald Could you tick off np.swapaxis? It seems to be part of the current numby codebase:

https://github.com/numba/numba/blob/7f72315f0bf672729b5a6eaf004aa9061e8bf0f5/numba/np/arrayobj.py#L5503-L5536

Done, thank you for the ping.

braniii commented 2 years ago

Hi, I guess that implementing np.newaxis is not just wiring, but needs a modification of the getitem function. The expression np.newaxis is an alias for None. I am sorry, if this is not belonging inside this thread but I guess. Here is a short code snippet:

>>> import numpy as np
>>> import numba

>>> np.newaxis is None
True

>>> @numba.njit
>>> def f(x):
>>>    return x[:, None]

>>> a = np.arange(4)
>>> f(a)
No implementation of function Function(<built-in function getitem>) found for signature:

 >>> getitem(array(int64, 1d, C), Tuple(slice<a:b>, none))

There are 22 candidate implementations:
    - Of which 20 did not match due to:
    Overload of function 'getitem': File: <numerous>: Line N/A.
      With argument(s): '(array(int64, 1d, C), Tuple(slice<a:b>, none))':
     No match.
    - Of which 2 did not match due to:
    Overload in function 'GetItemBuffer.generic': File: numba/core/typing/arraydecl.py: Line 162.
      With argument(s): '(array(int64, 1d, C), Tuple(slice<a:b>, none))':
     Rejected as the implementation raised a specific error:
       TypeError: unsupported array index type none in Tuple(slice<a:b>, none)
  raised from /home/braniii/anaconda3/lib/python3.8/site-packages/numba/core/typing/arraydecl.py:68

During: typing of intrinsic-call at <stdin> (3)
During: typing of static-get-item at <stdin> (3)
Tobi995 commented 1 year ago

np.allclose, np.argwhere, np.cumprod (I guess that is meant by np.cumproduct?), np.isclose, np.prod (I guess it's meant by np.product?) are all implemented in arraymath.py

guilhermeleobas commented 1 year ago

Thanks @Tobi995, I've updated the list.

zombinator0 commented 1 year ago

np.all has been preferred over np.alltrue as mentioned in https://github.com/numpy/numpy-stubs/pull/73. np.all is also listed under the Calculation section of Supported NumPy features (link).

I think np.alltrue can be ticked off the list

KrisMinchev commented 1 year ago

np.sometrue is deprecated in NumPy v.1.25.0 (link). It is recommended to use np.any instead, which is already implemented.

esc commented 1 year ago

np.sometrue is deprecated in NumPy v.1.25.0 (link). It is recommended to use np.any instead, which is already implemented.

Thank you for the update. I have edited the issue accordingly and np.sometrue has been crossed out.

KrisMinchev commented 1 year ago

np.asscalar is deprecated with Numpy v1.16.0 (link). It is an alias to the more powerful numpy.ndarray.item, not tested, and fails for scalars.

np.fastCopyAndTranspose is deprecated with Numpy v1.24.0 (link). It is reccomended to use the corresponding copy and transpose methods directly.

np.msort is deprecated with Numpy v1.24.0 (link). It is recommended to use np.sort(a, axis=0) instead, but the current implementation of np.sort() in Numba does not support the axis parameter.

guilhermeleobas commented 1 year ago

Thanks, @KrisMinchev. I have edited the list accordingly

KrisMinchev commented 1 year ago

The functions np.union1d and np.dstack are already implemented in arraymath.py and arrayobj.py respectively (see PRs #5544 and #8234 ). Also, np.geomspace is implemented in #9068.

redradist commented 1 year ago

Is there an alternative to np.unwrap(...) in numba ? 'Cause due to this I've got an error: Use of unsupported NumPy function 'numpy.unwrap' or unsupported use of the function.

KrisMinchev commented 1 year ago

np.diagflat has been implemented in #9113. np.vsplit, np.hsplit and np.dsplit have been implemented in #9082. np.nan_to_num has been implemented in #8623. np.resize has been implemented in #9118. np.row_stack has been implemented in #9085. np.trim_zeros has been implemented in #9074. np.flatnonzero has been implemented in #4157. I assume this is what was meant by np.floatnonzero in the above list. As for the polynomial functions in the list (np.polyadd, np.polysub, etc.), they are part of the old polynomial API which is only kept for backward compatibility. There is a newer polynomial package, np.polynomial, which is preferred, as explained here. The functions np.polynomial.polyadd, np.polynomial.polysub, np.polynomial.polymul from the new API have been implemented in #9087.

kmaitreys commented 1 year ago

I hope I have not missed, but I think there no mention of np.isin?

guilhermeleobas commented 1 year ago

@kmaitreys, thanks. I've updated the list to include np.isin. Let us know if you notice any other missing function.

jaredjeya commented 11 months ago

I'd like to add support for np.setxor1d as it seems fairly easy to do (a slight modification to np.intersect1d). There was a PR, #4677, with a very similar implementation but it seems to have gone stale and then closed without merging. Do we reopen that or start from new? I'm also considering doing np.setdiff1d.

guilhermeleobas commented 11 months ago

Hi @jaredjeya, feel free to fork #4677 and create a new PR.