Open chenstarx opened 2 months ago
I encountered the same issue.
It's partly a node
issue, have you tried bun
or deno
? I get much better performance using those 2 vs. node
It's partly a
node
issue, have you triedbun
ordeno
? I get much better performance using those 2 vs.node
My project highly depends on Node.js, it's hard to migrate to Bun or Deno...
Anything other operation using node
takes much longer than using Bun
For example I just tried: [...Array(1_000_000)].forEach((_, i) => pl.Series("abs", [1, 2, 3]));
It takes many times longer in node
than in Bun
Why do you think re-writing just _Expr
will solve your performance issue?
For me Bun
is 20% faster than Python
when running: [...Array(1_000_000)].forEach((_, i) => pl.Series("abs", [1, 2, 3]));
@maizhichao unfortunately, this is a known issue with nodejs. Their FFI implementation is incredibly slow compared to python or other js engines (deno, bun). The main bottleneck is sending values over n-api, which refactoring Expr
will not fix.
Have you tried latest version of polars?
What version of polars are you using?
0.0.15
What operating system are you using polars on?
MacOS 14.4.1 (M3 Pro)
What node version are you using
v20.12.1
Describe your bug.
I have encountered a significant performance issue when using the nodejs-polars library. Specifically, the time required to create multiple Expr objects is considerably higher compared to the Python version of polars.
What are the steps to reproduce the behavior?
To illustrate the issue, I conducted a performance test by generating one million Expr objects in both nodejs-polars and Python polars. The following code snippets demonstrate the test setup:
Python Code
Node.js Code
What is the actual behavior?
Python polars: Approximately 7 seconds to create 1,000,000 Expr objects. Node.js polars: Approximately 1,000 seconds to create the same number of Expr objects.
Impact
This performance discrepancy presents a significant bottleneck when performing operations that require frequent creation of Expr objects in nodejs-polars. It substantially limits the library's usability for large-scale data processing tasks in a Node.js environment.
What is the expected behavior?
The performance of creating Expr objects in nodejs-polars should be closer to, or ideally match, the performance in the Python version of polars.
Possible Reason
The issue might be caused by
_Expr
that will create an new Expr object when executed. Each execution will take about 0.5ms in my laptop, consuming considerable time if executed million times. Moreover, In my test dataset, the actual computing time for millions rows of data is very short, most of the time was wasted in creating the Expr objects.There might be two ways to solve this problem:
_Expr
with class, andreturn this
in each expression method, avoiding time-consuming operations of creating a complex new object in javascript.Thanks for reading the issue, I hope my suggestion would be helpful for the nodejs-polars library.