Open stuartarchibald opened 5 years ago
Great that you've prepared this list, it still seems rather long :) I'm not sure that each of the functions in this list is worthy of attention, and I think some of them need to be weeded out. For example all poly*
functions are somewhat deprecated and are kept only for backward compatibility. For future code it is encouraged to use numpy.polynomial
package: Polynomials. On the other hand, at the present time, it is much easier to implement support for functions in Numba than for classes from the Polynomial package.
I also had a feeling that the majority of numpy core developers would be happy to see financially-related functions outside of numpy. Perhaps this is my very subjective opinion.
Yes, this is a good point. By definition, this list will mostly have rarely used NumPy functions otherwise they would have likely been contributed already. Before picking one up, it is worth doing some web searching to see if people use it.
@godaygo thanks for your feedback. Indeed some of these functions are less critical than others. This list is written in part for the purposes of encouraging users to have a go at contributing functions to Numba and are a good gateway to being able to help out with more complicated issues.
As someone currently on extending a Numpy function for Numba (my second), I can say that it is a great exercise to learn more about how Numba works. I encourage everyone wanting to understand Numba, anyone looking to be a power user or dev, to try to implement a function or two. On the other hand, when I look at the list above I wonder about the amount of work required to create a second version of something that Numpy already has. The speed-up provided by Numba is amazing, and if we want these functions in jitted code this work is necessary, however, in the big picture this is still a duplicated effort, even before considering the possible maintenance burden. I wonder: How could this effort be connected to or leveraged from uarray and XND? Could there be one day a Num(ba)Py? A Numpy version where all the functions are jitted? Or equivalently, could Numpy accept contributions in which people replace Python versions of the functions with jitted ones? I am guessing the devs have thought about this already, and it might be part of the long term roadmap (item called "integration of Numba into core PyData packages" maybe?).
Broadly, it is something that would be nice to figure out, but it is a hard problem since NumPy's implementations were never designed to be consumed by something like Numba. They are a mixture of Python code (frequently written in a style that Numba has difficulty statically typing) and private C code (which Numba should not call since they aren't public, stable APIs).
Numba has the same problem with CPython, where we have to basically duplicate functionality because there is no C API to the thing we need to access (like Unicode character tables, or random number generators). This is the pragmatic cost of making Numba work today, but it is not ideal.
Understandably, most projects would view providing direct C access to these various algorithms as being out of scope for their project. I'm not sure what can be done here, but it is something we think about. A few years ago at SciPy, there was discussion of whether more low-level NumPy operations could be implemented in a way that could be consumed by JITs (like Numba and PyPy), but no progress has been made since then, AFAIK.
It looks like np.bincount
was already implemented in a previous PR but it is still in the list.
@guilhermeleobas thanks, done
Hi @stuartarchibald, can you cross np.count_nonzero
from the list? It was implemented in this PR.
Hi @stuartarchibald,
I do not think one needs to write code for numpy constants. They work just fine on jitted functions:
from numba import njit
import numpy as np
@njit
def foo():
return (np.pi, np.infty, np.euler_gamma)
foo()
Thanks @guilhermeleobas, updated.
Adding np.logspace
and np.geomspace
as per https://github.com/numba/numba/issues/6301
@esc Since automated broadcast hasn't been implemented, could you add numpy.broadcast_to
and/or numpy.broadcast_arrays
for manually broadcast? It seems could be implemented by numpy.stride_tricks.as_strided
.
@holymonson sure, done.
@esc I would be interested in implementing np.resize
to be supported. I apologize if I have overlooked it, but I cannot seem to find this "builder" class that is referenced, which I assume contains all the functions that should be used to optimize the performance of np.resize calls. Where might I find the corresponding source code/documentation for the builder? Also I welcome any other tips you have before I get started!
@caljrobe thanks for asking about this. I am not sure where the "builder" class might be referenced, but this usually refers to:
https://llvmlite.readthedocs.io/en/latest/user-guide/ir/ir-builder.html
Also, for implementing NumPy functions, you'll probably want to look at:
https://numba.pydata.org/numba-doc/dev/extending/overloading-guide.html
Hope it helps.
@esc It seems like the np.iscomplex
,np.iscomplexobj
,np.isneginf
,np.isposinf
,np.isreal
,np.isrealobj
,np.isscalar
and np.isnat
have already been implemented in #4610 and #5293, but the list has not updated yet.
@guoqiangqi thank you very much for the pointer, it is appreciated! I have updated the list accordingly. (But please do double check that I selected the correct functions).
@guoqiangqi thank you very much for the pointer, it is appreciated! I have updated the list accordingly. (But please do double check that I selected the correct functions).
I have checked again, it`s all good 👍
I think np.squeeze
should be moved into the "needs low-level work" category. It is impossible to determine the number of axes being squeezed from the input types alone and I am not aware of any method that can reshape to a ndim known at runtime only. Theoretically, it is possible to work out ndim at compile-time (array shapes should be known at compile-time?), but that would need https://github.com/numba/numba/issues/5339 and constexpr support. Maybe there is an unsafe reshape option that I am not aware of?
The other blocker is the changing return type depending on how many axes actually end up being squeezed. To explain this with a small example, dynamically squeezing the first axis of an array:
@numba.jit(nopython=True)
def squeeze(a):
if a.shape[0] == 1:
return np.asarray(a[0, ...])
return a
results in a surprising compile error:
Can't unify return type from the following types: array(int64, 0d, C), array(int64, 1d, C)
Return of: IR name '$52return_value.6', type 'array(int64, 0d, C)', location:
File "skbot/transform/_utils.py", line 74:
def poor_mans_squeeze(a:np.ndarray) -> np.ndarray:
<source elided>
if a.shape[0] == 1:
return np.asarray(a[0, ...])
^
Return of: IR name '$56return_value.1', type 'array(int64, 1d, C)', location:
File "skbot/transform/_utils.py", line 75:
def poor_mans_squeeze(a:np.ndarray) -> np.ndarray:
<source elided>
return np.asarray(a[0, ...])
return a
I think it's surprising, because it's all just views into the same buffer, which shouldn't be different types just because their ndim is different (their still both views).
@stuartarchibald Could you tick off np.swapaxis
? It seems to be part of the current numby codebase:
@stuartarchibald Could you tick off
np.swapaxis
? It seems to be part of the current numby codebase:
Done, thank you for the ping.
Hi, I guess that implementing np.newaxis
is not just wiring, but needs a modification of the getitem
function. The expression np.newaxis
is an alias for None
. I am sorry, if this is not belonging inside this thread but I guess.
Here is a short code snippet:
>>> import numpy as np
>>> import numba
>>> np.newaxis is None
True
>>> @numba.njit
>>> def f(x):
>>> return x[:, None]
>>> a = np.arange(4)
>>> f(a)
No implementation of function Function(<built-in function getitem>) found for signature:
>>> getitem(array(int64, 1d, C), Tuple(slice<a:b>, none))
There are 22 candidate implementations:
- Of which 20 did not match due to:
Overload of function 'getitem': File: <numerous>: Line N/A.
With argument(s): '(array(int64, 1d, C), Tuple(slice<a:b>, none))':
No match.
- Of which 2 did not match due to:
Overload in function 'GetItemBuffer.generic': File: numba/core/typing/arraydecl.py: Line 162.
With argument(s): '(array(int64, 1d, C), Tuple(slice<a:b>, none))':
Rejected as the implementation raised a specific error:
TypeError: unsupported array index type none in Tuple(slice<a:b>, none)
raised from /home/braniii/anaconda3/lib/python3.8/site-packages/numba/core/typing/arraydecl.py:68
During: typing of intrinsic-call at <stdin> (3)
During: typing of static-get-item at <stdin> (3)
np.allclose, np.argwhere, np.cumprod (I guess that is meant by np.cumproduct?), np.isclose, np.prod (I guess it's meant by np.product?) are all implemented in arraymath.py
Thanks @Tobi995, I've updated the list.
np.all
has been preferred over np.alltrue
as mentioned in https://github.com/numpy/numpy-stubs/pull/73. np.all
is also listed under the Calculation section of Supported NumPy features (link).
I think np.alltrue
can be ticked off the list
np.sometrue
is deprecated in NumPy v.1.25.0 (link). It is recommended to use np.any
instead, which is already implemented.
np.sometrue
is deprecated in NumPy v.1.25.0 (link). It is recommended to usenp.any
instead, which is already implemented.
Thank you for the update. I have edited the issue accordingly and np.sometrue
has been crossed out.
np.asscalar
is deprecated with Numpy v1.16.0 (link). It is an alias to the more powerful numpy.ndarray.item
, not tested, and fails for scalars.
np.fastCopyAndTranspose
is deprecated with Numpy v1.24.0 (link). It is reccomended to use the corresponding copy and transpose methods directly.
np.msort
is deprecated with Numpy v1.24.0 (link). It is recommended to use np.sort(a, axis=0)
instead, but the current implementation of np.sort()
in Numba does not support the axis parameter.
Thanks, @KrisMinchev. I have edited the list accordingly
The functions np.union1d
and np.dstack
are already implemented in arraymath.py and arrayobj.py respectively (see PRs #5544 and #8234 ). Also, np.geomspace
is implemented in #9068.
Is there an alternative to np.unwrap(...)
in numba ?
'Cause due to this I've got an error: Use of unsupported NumPy function 'numpy.unwrap' or unsupported use of the function.
np.diagflat
has been implemented in #9113.
np.vsplit
, np.hsplit
and np.dsplit
have been implemented in #9082.
np.nan_to_num
has been implemented in #8623.
np.resize
has been implemented in #9118.
np.row_stack
has been implemented in #9085.
np.trim_zeros
has been implemented in #9074.
np.flatnonzero
has been implemented in #4157. I assume this is what was meant by np.floatnonzero
in the above list.
As for the polynomial functions in the list (np.polyadd
, np.polysub
, etc.), they are part of the old polynomial API which is only kept for backward compatibility. There is a newer polynomial package, np.polynomial
, which is preferred, as explained here. The functions np.polynomial.polyadd
, np.polynomial.polysub
, np.polynomial.polymul
from the new API have been implemented in #9087.
I hope I have not missed, but I think there no mention of np.isin
?
@kmaitreys, thanks. I've updated the list to include np.isin
. Let us know if you notice any other missing function.
I'd like to add support for np.setxor1d
as it seems fairly easy to do (a slight modification to np.intersect1d
). There was a PR, #4677, with a very similar implementation but it seems to have gone stale and then closed without merging. Do we reopen that or start from new? I'm also considering doing np.setdiff1d
.
Hi @jaredjeya, feel free to fork #4677 and create a new PR.
Below are lists of feasible but not yet implemented NumPy functions... ideal for first time contributors. If you would like to contribute a function, please first check the open "Pull Requests" to make sure someone else isn't already working on the one you chose! If the check-box next to the function is checked it means the function is already done...
As a new contributor, some instructions for getting set up for development are here and are a great place to start.
It should be possible to write these functions using the
numba.extending.overload
functionality, they are of varying difficulty. A guide to using@overload
is here:np.alen
~ deprecated, see https://github.com/numpy/numpy/issues/14155np.allclose
np.alltrue
np.append
np.argpartition
np.argwhere
np.around
np.array_equal
np.array_split
np.asanyarray
np.asarray_chkfinite
np.asfarray
np.asscalar
~ deprecated, see https://github.com/numpy/numpy/pull/20414np.atleast_1d
np.atleast_2d
np.atleast_3d
np.bartlett
np.binary_repr
np.bincount
np.blackman
np.compress
np.count_nonzero
np.cross
np.cumproduct
(just needs wiring)np.diag_indices
np.diag_indices_from
np.diagflat
np.euler_gamma
(constant)np.fastCopyAndTranspose
~ deprecated, see https://github.com/numpy/numpy/pull/22313np.floatnonzero
np.fliplr
np.flipud
np.fv
~ deprecated see NumPy's NEP-0032.np.geomspace
np.hamming
np.hanning
np.hsplit
np.i0
np.in1d
np.indices
np.infty
(constant)np.inner
np.intersect1d
np.ipmt
~ deprecated see NumPy's NEP-0032.np.irr
~ deprecated see NumPy's NEP-0032.np.isclose
np.iscomplex
np.iscomplexobj
np.isfortran
np.isin
np.isnat
np.isneginf
np.isposinf
np.isreal
np.isrealobj
np.isscalar
np.kaiser
np.little_endian
np.logspace
np.meshgrid
np.mirr
~ deprecated see NumPy's NEP-0032.np.msort
(largely wiring!)~ deprecated, see https://github.com/numpy/numpy/pull/22456np.nan_to_num
np.nanargmax
np.nanargmin
np.ndim
(just needs wiring)np.newaxis
(just needs wiring)np.nper
~ deprecated see NumPy's NEP-0032.np.npv
~ deprecated see NumPy's NEP-0032.np.place
np.pmt
~ deprecated see NumPy's NEP-0032.np.poly
np.polyadd
np.polyder
np.polydiv
np.polyint
np.polymul
np.polysub
np.polyval
np.ppmt
~ deprecated see NumPy's NEP-0032.np.product
(just needs wiring)np.put
np.putmask
np.pv
~ deprecated see NumPy's NEP-0032.np.rate
~ deprecated see NumPy's NEP-0032.np.resize
np.rot90
np.row_stack
np.select
np.setdiff1d
np.setxor1d
np.size
(mostly wiring)(deprecated with 1.25np.sometrue
(just needs wiring)np.sortcomplex
np.split
np.tril_indices
np.tril_indices_from
np.triu_indices
np.triu_indices_from
np.trim_zeros
np.union1d
np.unravel_index
np.unwrap
np.vsplit
It is also seemingly possible to write these functions with
numba.extending.overload
but they are harder/it is less easy to determined difficultly in implementing these:np.choose
np.common_type
np.copyto
np.dsplit
np.dstack
np.fix
np.gradient
np.histogram2d
np.histogramdd
np.insert
np.ix_
np.lexsort
np.mask_indices
np.maximum_sctype
np.nested_iters
np.pad
np.piecewise
np.polyfit
np.put_along_axis
np.squeeze
np.swapaxis
np.tensordor
These are likely to need lower level work:
np.broadcast_to
np.broadcast_arrays
np.byte_bounds
np.packbits
np.unpackbits