Add Numba as an optional dependency for rolling.apply for pandas 1.0

mroeschke commented 4 years ago

As mentioned on the pandas dev call last week, I've been working with @jreback and @DiegoAlbertoTorres on a proof of concept (POC) implementing rolling.mean and rolling.apply using Numba instead of our current Cython implementation. As described in this proof of concept document, we worked on:

1) Refactoring window bound calculation and aggregation to use Numba 2) Developing a new API for users to implement their own window bounds calculations (df.rolling(MyWindowerClass()).mean())

The document details performance results, high level implementation details and integration plan into pandas. The fork is up to date with master and is running the same CI checks.

The proposal for pandas 1.0 is:

1) Introduce Numba as a required pandas dependency 2) Have rolling.mean dispatch to the Numba implementation

The proposal for post pandas 1.0 is:

1) Expose the new API for users to implement their own window bounds calculations 2) Implement all rolling aggregations (min, max, count, etc,) in Numba 3) Implement EWM and Expanding in Numba 4) Generalize data grouping (groupby, rolling, resample) and aggregations (mean, max, etc) using Numba jitclasses

As the issue title notes, I'm hoping to keep this discussion focused around any concerns, thoughts, suggestions regarding the POC and how it pertains to the pandas 1.0 proposal, but feel free to ask any questions regarding topics the POC doesn't cover.

cc @pandas-dev/pandas-core

rgommers commented 4 years ago

Hi all. This proposal has a lot of implications that I don't see addressed in the PoC document. I would suggest reviewing this thread about introducing Numba as a SciPy dependency from last year.

Some concrete questions that may be showstoppers are:

what is the status on ARM and other more exotic hardware (e.g. AIX, the Debian-supported architectures)?
what's the impact on deployment size (e.g. this may hurt embedded use cases, and push NumPy+SciPy+Pandas over the size limit for AWS Lambda).

Pydata/Sparse, which is pure Python + Numba, is thinking about incrementally moving parts back to compiled code, due to stability and currently missing features in Numba.

And I think that Scikit-learn experimented with this fairly recently and also came away with "not ready yet" (second-hand info via @amueller, please correct me if I'm wrong @nicolashug).

shoyer commented 4 years ago

I’m sympathetic to the goal here, but my sense is that Numba can still be a challenging project to redistribute, so I’m not sure this is a good idea. Not everyone uses conda or pip. Just look at the building from source instructions for patching LLVM, for example: https://llvmlite.readthedocs.io/en/latest/admin-guide/install.html.

A strength of pandas has been that it’s easy to install with a standard toolchain, and has a minimal set of required dependencies.

In your linked doc, you list three benefits from using Numba:

Performance parity or improvement over Cython

Eliminate shipping C-extensions

Ease of debugging and maintenance as pure Python code

With all due respect, I question benefits (2) and (3).

Removing all C extensions would be to remove the need for a C compiler when building pandas from source, but this seems quite farfetched for pandas. Until that happens, I’m not sure why this would be productive. C extensions are a mature and stable technology.

Calling Numba wrapped code “pure Python code” from a debugging perspective also seems misguided. It’s only pure Python until you compile it with Numba, at which point you can debug Numba’s compiler and LLVM IR. I’m pretty confident that the toolchain for debugging Cython’s generated C code is much more battle tested. For example, Cython is actually capable of embedding arbitrary Python, while Numba requires dropping back into “object” mode, which as far as I understand entails skipping most of the compiler.

toobaz commented 4 years ago

As mentioned in the last dev call, my (very limited) experience with numba suggests that it is very easy (= few lines of decorator) to use it without hard depending on it, and leaving the user with the pure Python implementation if numba is missing. While such pure Python implementation would be very slow, this would affect only rolling functions which are not necessarily used by all users (at that time, we would probably emit a warning). In any case, while I would not impose the burden of always writing a numba and a numpy implementation of rolling functions, this does not mean we have to kill existing numpy implementations (as a fallback) in 1.0.

So if we decide to go ahead introducing numba in 1.0 (and on this, other people have better judgement than me), I still would refrain, if possible, from introducing the hard dependency.

gfyoung commented 4 years ago

I still would refrain, if possible, from introducing the hard dependency.

If you're going to use numba, I think you should either go "all in" or not at all. The toolchain implications of using numba are not trivial, and the fallback is not ideal with pure Python code.

this would affect only rolling functions which are not necessarily used by all users

Sure, but that doesn't mean we should sacrifice performance. Ideally, there should be a non-negative benefit for all users if we made such a switch.

NicolasHug commented 4 years ago

And I think that Scikit-learn experimented with this fairly recently and also came away with "not ready yet" (second-hand info via @amueller, please correct me if I'm wrong @NicolasHug).

Yes, I don't think Numba is mature enough for scikit-learn to switch from Cython.

I've used numba extensively in pygbm and more recently in a toy project (hmmkay). Here are the main arguments I have against it:

error messages aren't helpful (at all)
compilation time is significant for a large code-base (caching makes it less of an issue but it can be surprising for users)
it's fairly unstable compared to cython. Bugs can be extremely hard to catch, and it takes time to realize that the bug comes from numba, and not from your code. I literally wasted days with pygbm on this.
API is not entirely stable yet (e.g. deprecation of reflected lists for typed lists)
A few features are missing, like being able to parallelize methods. It doesn't look like much, but it makes the code look like spaghetti after a while.
It's magic, so it's hard to know whether what you are writing is actually optimal or not. This is also true for Cython but I find numba even more surprising.

With all this I tend to question the "ease of use" argument of numba, I don't find it to be true, for now.

Regarding allowing a soft dependency on Numba: numba code is Python code, but it's not pythonic code. For numba to generate efficient code you need to unfold for loops, etc. Yes this is pure python code, but it will be incredibly slow if numba does not JIT compile it (slower than if you had been writing pythonic code).

WillAyd commented 4 years ago

With one of the major concerns in this thread being the installation dependency, is there a comprehensive listing of what platforms this would impact? I'm not sure we've been explicit ourselves as to what platforms are supported; if it is a complicating factor for niche platforms I don't think that should be a showstopper

FWIW I think it's definitely worth exploring. The performance improvements on lambda expressions are great, and while we already have a lot of Cython architecture in place I would argue that we may be biased in assessing Cython as easier to debug than Numba. I don't find that we get a lot of newcomers that jump into the Cython code, nor is our Cython code always clean or consistent. I could see Numba potentially being easier to digest

jbrockmendel commented 4 years ago

A lot of the conversation has been about numba vs cython, but we also have a soft dependency on bottleneck that is used in core.nanops that we might be able to remove if we were to use numba. IIRC @shoyer did a proof of concept for switching over some of those functions?

bottleneck's last release was May 2017 and there are some bugs e.g. https://github.com/pandas-dev/pandas/issues/22385 that don't seem to have any movement. I would view removing this dependency as an upside to using numba.

On Tue, Oct 15, 2019 at 9:08 AM William Ayd notifications@github.com wrote:

With one of the major concerns in this thread being the installation dependency, is there a comprehensive listing of what platforms this would impact? I'm not sure we've been explicit ourselves as to what platforms are supported; if it is a complicating factor for niche platforms I don't think that should be a showstopper

FWIW I think it's definitely worth exploring. The performance improvements on lambda expressions are great, and while we already have a lot of Cython architecture in place I would argue that we may be biased in assessing Cython as easier to debug than Numba. I don't find that we get a lot of newcomers that jump into the Cython code, nor is our Cython code always clean or consistent. I could see Numba potentially being easier to digest

— You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/28987?email_source=notifications&email_token=AB5UM6FA6ASQEGLADCL4D2LQOXTJVA5CNFSM4JAXTJC2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBJKW7Y#issuecomment-542288767, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5UM6EWB7QB6PMPSCVMAG3QOXTJVANCNFSM4JAXTJCQ .

shoyer commented 4 years ago

A lot of the conversation has been about numba vs cython, but we also have a soft dependency on bottleneck that is used in core.nanops that we might be able to remove if we were to use numba. IIRC @shoyer did a proof of concept for switching over some of those functions?

It is straightforward to rewrite most of the algorithms in bottleneck in Numba (e.g., like I did in http://github.com/shoyer/numbagg), but also in Cython or C++. If somebody cares about removing the bottleneck dependency, there are any number of fine ways to do so.

Actually C++ is really the elephant in the room here. It is definitely the most mature technology for high performance, high level programming.

bottleneck's last release was May 2017 and there are some bugs e.g. #22385 that don't seem to have any movement

Bottleneck has a new maintainer and recently moved over in the pydata GitHub organization.

I don't think it's fair to consider #22385 a bug. Bottleneck doesn't implement pairwise summation, but that doesn't make it wrong, it just a slightly less numerically stable algorithm for computing sums.

jorisvandenbossche commented 4 years ago

There are two goals mentioned in the top post: 1) use numba to implement the built-in rolling operations and 2) provide an API for the user to pass a numba function to be used as rolling operation.

I think the second goal is also very useful (and for the user give more added value, as the first goal is more an internal change). Could this second goal be implemented without having numba as a hard dependency?

stuartarchibald commented 4 years ago

Hi, Numba core developer here,

We agree that there are concerns in using Numba on a project like Pandas, and we would like to prioritise them better. It has been helpful in the past to itemise them and obtain clarity over the issues such that they can be fixed/improved (xref: https://github.com/numba/numba/issues/2888 for the last round of this). I've opened a new issue to start a new discussion over on the Numba issue tracker (xref: https://github.com/numba/numba/issues/4723).

Thanks!

mroeschke commented 4 years ago

Thanks for all the feedback @pandas-dev/pandas-core

I'd like to revise my pandas 1.0 proposal to the following:

1) Introduce Numba as an optional dependency 2) Have rolling.apply dispatch to Numba as opt-in behavior (e.g. rolling(...).apply(..., engine='Numba')) as a potential significant performance boost

The main allure of Numba in its immediate state is being able to write Python and achieve comparable performance with Cython (or C++) thus providing:

1) Lower barrier to entry for new contributors. 2) More debuggable workflow for the maintance team, e.g. @jit(forceobj=True) to drop back to Python. 2a) Debugging the LLVM IR has been more of a matter of debugging performance instead of result correctness in my experience; the later which is more important.

A con of adding Numba as a optional dependency for rolling is having to maintain duplicate implementations in Cython and Numba. The burden should go down in the long term as we re-visit Numba as a core dependency and circle back to a single implementation.

While a lot of the concerns are around Numba's immaturity, it is a consideration for limiting its application in pandas 1.0 (to only rolling.apply). Moreover, I believe adopting Numba as an optional dependency can help drive significant feedback to the project (platform support, feature completeness, etc)

jorisvandenbossche commented 4 years ago

A con of adding Numba as a optional dependency for rolling is having to maintain duplicate implementations in Cython and Numba.

Why is this? Would you still include implementations of all built-in rolling functions in pandas? I would think the .apply(func, engine='numba') is mainly for user-defined-functions?

jreback commented 4 years ago

A con of adding Numba as a optional dependency for rolling is having to maintain duplicate implementations in Cython and Numba.

Why is this? Would you still include implementations of all built-in rolling functions in pandas? I would think the .apply(func, engine='numba') is mainly for user-defined-functions?

no, you wouldn't need to (at this point), duplicate anything for building in rolling function (e.g. mean); the duplication of code is in the computation of the Fixed and Variable window indexers themselves (not a huge duplication).

mroeschke commented 4 years ago

Right, I think we'll need 2 implementations for calculating the window boundaries, a Cython version (for rolling functions that are still in Cython) and a Numba version (for rolling functions that get converted to Numba)

TomAugspurger commented 4 years ago

@mroeschke are you hoping to do this for 1.0? Roughly, how much effort is left on https://github.com/pandas-dev/pandas/pull/29428, and how much additional effort to close this? Trying to get a sense for where we're at.

Edit: and should the title of this post be updated based on https://github.com/pandas-dev/pandas/issues/28987#issuecomment-543853896? The optional part is still the plan, right?

mroeschke commented 4 years ago

@TomAugspurger yes still aiming for this for 1.0.

Hoping to finish #29428 by end of week. After there should be ~2/3 followup PRs (1 more cleanup for the "window indexers", 1 for Numba + rolling apply). These PR's should be a little more straightforward and hoping to finish those in December/end of year.

rgommers commented 4 years ago

@TomAugspurger yes still aiming for this for 1.0.

Just an observation: there's a lot of concerns and potential issues in this issue, with essentially no response, except for "let's make it optional then". If you had discussions in a dev call or something, it would be nice to summarize, because the decision making looks a little odd here.

TomAugspurger commented 4 years ago

No dev call (though this will likely come up during tomorrow's).

My (possibly wrong) understanding of the situation is that

We're not ready for a hard dependency on numba
Being able to do a .rolling(...).apply(numba_jitted_user_function) all in Numba is compelling enough to warrant some duplicate code (though I haven't followed the discussion on how much)

Though that doesn't quite square with @mroeschke comment

and a Numba version (for rolling functions that get converted to Numba)

Are we converting some functions to Numba? Or are we limiting the scope to taking a numba-path for user-defined functions that have been jitted (by the user).

mroeschke commented 4 years ago

The revised proposal https://github.com/pandas-dev/pandas/issues/28987#issuecomment-543853896 was in response to the various concerns around Numba and making its use essential fully optional.

Reiterating the proposal points:

Introduce Numba as an optional dependency
Have rolling.apply dispatch to Numba as opt-in behavior (e.g. rolling(...).apply(user_python_func, engine='Numba')) as a potential significant performance boost

@TomAugspurger so the idea is that we would try to jit the users function (and raise if we fail) instead of the users jitting before passing to apply - though we could accomodate the latter as well.

Added the topic as a discussion point for tomorrow's dev call.

pandas-dev / pandas

Add Numba as an optional dependency for rolling.apply for pandas 1.0 #28987