pydata / numexpr

Fast numerical array expression evaluator for Python, NumPy, Pandas, PyTables and more
https://numexpr.readthedocs.io/en/latest/user_guide.html
MIT License
2.25k stars 212 forks source link

Support multithreading #496

Closed emmaai closed 2 months ago

emmaai commented 3 months ago

As the title (ref: https://github.com/pydata/numexpr/issues/494), the change:

reasoning:

benchmark case: It's based on my reality that most of time, I got large amount of data and only chunks of data can fit into memory, most of my cpus are idling when io (especially i) happens. If the data can fit into memory and cpus are fully utilized, I guess it doesn't make much difference between threading with numpy by chunks vs numexpr. Certainly I can implement what numexpr can achieve by further chunking each chunk, but why? Specifically I use dask threading to schedule the tasks. All I have to do is to pack ne.evaluate nicely into a "task" if thread safety is taken care of.

FrancescAlted commented 3 months ago

Cool. Can you add an example of a benchmark (you can put it in bench/) where the new feature is exercised and where we can see the advantage of the new method? It would be nice if you can add the results of your machine as a docstring at the beginning of the script; alternatively, you can use a jupyter notebook too. Thanks!

gdementen commented 3 months ago

I am sorry if my comment is dumb, and I am not the maintainer for numexpr, so my opinion is not worth much, but I have the impression your benchmark compares apples to oranges: numpy 2 threads vs numexpr 32 (?) threads (2 explicit threads x 16 builtin threads) or how does the builtin numexpr threading interact with manual threading anyway? Also I would be interested in a benchmark against "normal"/builtin numexpr threading, which I think is more interesting than against numpy. Unless there is something I don't understand (very likely), I don't expect much difference.

emmaai commented 3 months ago

There are many benchmark cases both against numpy and different numbers of threads, under the folder bench/. The whole point of this pr is to avoid implementing the same mechanism of numexpr in numpy if multithreading, numexpr is not thread safe due to global dict, which is stated clearly in pr content.

I’m not quite sure if I understand the comment “how does the builtin numexpr interact with manual threading”. Well, better they don’t? If they do, usually implies race condition. The change here is to guarantee that they don’t. Oversubscription (I have only 16 cores) is a common technique that the cpu utilisation is low due to io, cuz in reality the other thread(s) might be loading the data not using cpus that much.

As commented in benchmark file, it has 2 threads because the presumption is that memory is only enough for 2 chunks computation. Then there are two choices, one is to threading on smaller chunks with numpy, the other is to hand over threading to numexpr. To me clearly the later is an easier option. With chunk being smaller and oversubscription of threads, numexpr is not doing as well as other conditions (again they’re available under bench/ folder). However, it’s still much better than single numpy, and MUCH less work.

On Tue, 27 Aug 2024 at 5:58 PM, Gaëtan de Menten @.***> wrote:

I am sorry if my comment is dumb, and I am not the maintainer for numexpr, so my opinion is not worth much, but I have the impression your benchmark compares apples to oranges: numpy 2 threads vs numexpr 32 (?) threads (2 explicit threads x 16 builtin threads) or how does the builtin numexpr threading interact with manual threading anyway? Also I would be interested in a benchmark against "normal"/builtin numexpr threading, which I think is more interesting than against numpy. Unless there is something I don't understand (very likely), I don't expect much difference.

— Reply to this email directly, view it on GitHub https://github.com/pydata/numexpr/pull/496#issuecomment-2311885055, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABLBWNQY3DKAKV6KYZ7PVY3ZTQ2DFAVCNFSM6AAAAABNDLTO4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMJRHA4DKMBVGU . You are receiving this because you authored the thread.Message ID: @.***>

FrancescAlted commented 3 months ago

Thanks @emmaai for your example. It was more for me (and others!) to understand the way you wanted to use your feature. Now it is much clearer, and sounds good to me. The only thing that I'd ask is to add a new test exercising this new feature; tests are actually the way to ensure that we are not introducing regressions in the future.

emmaai commented 3 months ago

Test added to verify thread safety by always manifesting the race condition.

gdementen commented 3 months ago

Thanks @emmaai for the added explanation.

emmaai commented 2 months ago

I'm following up with this pr, wondering if there is any concern that it can't be merged?

FrancescAlted commented 2 months ago

I've just activated the tests in CI, and Mac OSX is reporting a failure. Can you address it?

emmaai commented 2 months ago

sorry forgot to commit the change in another file when pushing the test. it should pass now

FrancescAlted commented 2 months ago

Thanks @emmaai !