Closed jkatz closed 1 year ago
I think that we aren't zero-copy for arrays, I don't know if there's a good way for us to do that right now... I think it needs some work on the pgx side first, but @workingjubilee would know better. It does sound valuable.
@thomcc Thanks for the quick response and for looking into this. Agreed that this is valuable. It definitely became more apparent as I increased array size.
Yes, there is an unnecessary overhead due to pgx extracting the data from arrays using certain Postgres-provided functions for safely deconstructing arrays that are not particularly speed-optimized. I had a draft patch at some point that addresses this that I had put off due to being distracted by the more PL/Rust-specific concerns, amongst other things. I'll see if I can scare that up and rebase it.
@workingjubilee Thanks! Do you want me to open up a corresponding issue in pgx to track it? I don't have a pgx-specific test case.
Sure!
I think the main reason I hadn't opened up a PR the first time I drafted things is I wanted to also bring it on with a modest amount of performance testing to "show my work" (and prevent future regressions).
These SQL queries are pretty bad for determining performance since the generate_random_float8_array
pl/pgsql function is called for every row in the arr_large
table.
That said, the function-capabilities
branch, which is PR #298 now has zero-copy array support for pass-by-value types such as f64. It'll be merged into the develop
branch shortly and maybe you can try again.
Against the function-capabilities
branch:
[v14.7][1979980] plrust=# explain analyze select euclidean_distance(data, data) from arr_large limit 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..2.66 rows=10 width=8) (actual time=0.068..0.393 rows=10 loops=1)
-> Seq Scan on arr_large (cost=0.00..26637.00 rows=100000 width=8) (actual time=0.067..0.390 rows=10 loops=1)
Planning Time: 0.046 ms
Execution Time: 0.406 ms
(4 rows)
[v14.7][1979980] plrust=# explain analyze select euclidean_distance(data, data) from arr_large;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Seq Scan on arr_large (cost=0.00..26637.00 rows=100000 width=8) (actual time=0.115..2346.068 rows=100000 loops=1)
Planning Time: 0.048 ms
Execution Time: 2350.093 ms
(3 rows)
v/s main
:
[v14.7][1980289] plrust=# explain analyze select euclidean_distance(data, data) from arr_large limit 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..2.66 rows=10 width=8) (actual time=0.088..0.498 rows=10 loops=1)
-> Seq Scan on arr_large (cost=0.00..26637.00 rows=100000 width=8) (actual time=0.087..0.495 rows=10 loops=1)
Planning Time: 0.057 ms
Execution Time: 0.514 ms
(4 rows)
[v14.7][1980289] plrust=# explain analyze select euclidean_distance(data, data) from arr_large;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Seq Scan on arr_large (cost=0.00..26637.00 rows=100000 width=8) (actual time=0.117..2434.598 rows=100000 loops=1)
Planning Time: 0.043 ms
Execution Time: 2438.613 ms
(3 rows)
What is it that you're actually doing to compare against "the C-based functions"?
While we thought this was resolved as of v1.1.0, there was more room for improvement, and v1.1.3 will resolve it. Thanks to PR #310 and https://github.com/tcdi/pgrx/pull/1145.
I noticed there may be some performance overhead when performing calculations on arrays. I've found it to be more pronounced on data sets that have arrays with higher cardinality. Experiment details below:
Here is a helper function I use in the experiment:
For the experiment, I created and populated the following tables:
I created the following PL/Rust function and wrote an equivalent C function:
(As I'm new to Rust, I also write a version of the function that did not use the helper functions, but iterated through the array via a for loop. The results were comparable).
I ran a variety of queries, including:
In my tests, the PL/Rust functions were consistently 2-3x slower than the C-based functions. In some variations of the testing, the PL/Rust functions were only marginally slower than the C-based ones. By contrast, the PL/Rust functions were on average 5-10x faster than PL/pgSQL equivalents, but I'm curious to see if there is any room for improvement in this case.