Performance overhead when performing array calculations

jkatz commented 1 year ago

I noticed there may be some performance overhead when performing calculations on arrays. I've found it to be more pronounced on data sets that have arrays with higher cardinality. Experiment details below:

Here is a helper function I use in the experiment:

CREATE OR REPLACE FUNCTION generate_random_float8_array(dim int)
RETURNS float8[]
LANGUAGE plpgsql
STRICT
AS $$
    DECLARE
        i int := 1;
        v float8[];
    BEGIN
        FOR i IN 1..dim LOOP
            v[i] := random();
        END LOOP;

        RETURN v;
    END;
$$;

For the experiment, I created and populated the following tables:

CREATE TABLE arr_small (
    id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    data float8[20]
);

CREATE TABLE arr_large (
    id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    data float8[800]
);

INSERT INTO arr_small (data)
SELECT generate_random_float8_array(20) FROM generate_series(1,100000);

INSERT INTO arr_large (data)
SELECT generate_random_float8_array(800) FROM generate_series(1,100000);

I created the following PL/Rust function and wrote an equivalent C function:

CREATE OR REPLACE FUNCTION euclidean_distance(a float8[], b float8[])
RETURNS float8
LANGUAGE plrust
IMMUTABLE PARALLEL SAFE STRICT
AS $$
    let distance: f64 = a.iter()
        .zip(b.iter())
        .map(|(&ai, &bi)| (ai.unwrap() - bi.unwrap()).powi(2))
        .sum()
        .sqrt();

    Ok(Some(distance))
$$;

(As I'm new to Rust, I also write a version of the function that did not use the helper functions, but iterated through the array via a for loop. The results were comparable).

I ran a variety of queries, including:

SELECT id
FROM arr_small,
    LATERAL (
        SELECT generate_random_float8_array(20) AS data
    ) x
ORDER BY euclidean_distance(x.data, arr_small.data)
LIMIT 10;

SELECT id
FROM arr_large,
    LATERAL (
        SELECT generate_random_float8_array(800) AS data
    ) x
ORDER BY euclidean_distance(x.data, arr_large.data)
LIMIT 10;

In my tests, the PL/Rust functions were consistently 2-3x slower than the C-based functions. In some variations of the testing, the PL/Rust functions were only marginally slower than the C-based ones. By contrast, the PL/Rust functions were on average 5-10x faster than PL/pgSQL equivalents, but I'm curious to see if there is any room for improvement in this case.

thomcc commented 1 year ago

I think that we aren't zero-copy for arrays, I don't know if there's a good way for us to do that right now... I think it needs some work on the pgx side first, but @workingjubilee would know better. It does sound valuable.

jkatz commented 1 year ago

@thomcc Thanks for the quick response and for looking into this. Agreed that this is valuable. It definitely became more apparent as I increased array size.

workingjubilee commented 1 year ago

Yes, there is an unnecessary overhead due to pgx extracting the data from arrays using certain Postgres-provided functions for safely deconstructing arrays that are not particularly speed-optimized. I had a draft patch at some point that addresses this that I had put off due to being distracted by the more PL/Rust-specific concerns, amongst other things. I'll see if I can scare that up and rebase it.

jkatz commented 1 year ago

@workingjubilee Thanks! Do you want me to open up a corresponding issue in pgx to track it? I don't have a pgx-specific test case.

workingjubilee commented 1 year ago

Sure!

I think the main reason I hadn't opened up a PR the first time I drafted things is I wanted to also bring it on with a modest amount of performance testing to "show my work" (and prevent future regressions).

eeeebbbbrrrr commented 1 year ago

These SQL queries are pretty bad for determining performance since the generate_random_float8_array pl/pgsql function is called for every row in the arr_large table.

That said, the function-capabilities branch, which is PR #298 now has zero-copy array support for pass-by-value types such as f64. It'll be merged into the develop branch shortly and maybe you can try again.

Against the function-capabilities branch:

[v14.7][1979980] plrust=# explain analyze select euclidean_distance(data, data) from arr_large limit 10;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2.66 rows=10 width=8) (actual time=0.068..0.393 rows=10 loops=1)
   ->  Seq Scan on arr_large  (cost=0.00..26637.00 rows=100000 width=8) (actual time=0.067..0.390 rows=10 loops=1)
 Planning Time: 0.046 ms
 Execution Time: 0.406 ms
(4 rows)

[v14.7][1979980] plrust=# explain analyze select euclidean_distance(data, data) from arr_large;
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Seq Scan on arr_large  (cost=0.00..26637.00 rows=100000 width=8) (actual time=0.115..2346.068 rows=100000 loops=1)
 Planning Time: 0.048 ms
 Execution Time: 2350.093 ms
(3 rows)

v/s main:

[v14.7][1980289] plrust=# explain analyze select euclidean_distance(data, data) from arr_large limit 10;
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2.66 rows=10 width=8) (actual time=0.088..0.498 rows=10 loops=1)
   ->  Seq Scan on arr_large  (cost=0.00..26637.00 rows=100000 width=8) (actual time=0.087..0.495 rows=10 loops=1)
 Planning Time: 0.057 ms
 Execution Time: 0.514 ms
(4 rows)

[v14.7][1980289] plrust=# explain analyze select euclidean_distance(data, data) from arr_large;
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Seq Scan on arr_large  (cost=0.00..26637.00 rows=100000 width=8) (actual time=0.117..2434.598 rows=100000 loops=1)
 Planning Time: 0.043 ms
 Execution Time: 2438.613 ms
(3 rows)

What is it that you're actually doing to compare against "the C-based functions"?

eeeebbbbrrrr commented 1 year ago

While we thought this was resolved as of v1.1.0, there was more room for improvement, and v1.1.3 will resolve it. Thanks to PR #310 and https://github.com/tcdi/pgrx/pull/1145.

tcdi / plrust

Performance overhead when performing array calculations #275