Open ashwinvis opened 3 months ago
Hey @ashwinvis : I'd go with implementatin np.percentile
with the default method, and see if we can match numpy's speed. If we can, implementing the nan
version would be the next step. That's for the high level view.
For the details, numpy algorithms seems to be based on https://github.com/numpy/numpy/blob/3b246c6488cf246d488bbe5726ca58dc26b6ea74/numpy/lib/_function_base_impl.py#L4830
I agree, all functions seem to be based on _quantile
. The nan*
variants simply requires weeding out NaNs before running its regular variant. A good close study is needed, before I can decide if I (or someone else) can do this or not..
@serge-sans-paille I saw that xtensor made a C++ port of quantile and its many variants. Do you think it can be ported using Pythran's pythonic?
I know that you have commented in the past (#1476) that xtensor's approach is incompatible. I wonder if it is true for this function.
Function
nanpercentile
in numpy can be awfully slow if we useaxis=0
when the number of columns are a huge, or vice-versa. This is noted here:And I have a use-case for this from work.
There are faster JIT-versions of this now in numbaagg, jax etc, but it will be easier to ship something statically compiled á la Pythran. Any pointers to get started?
Possible follow-up easy quick-wins
nanquantile(a, q, ...)
=nanpercentile(a, q*100, ...)
nanmedian(a, ...)
=nanpercentile(a, 50, ...)