tcdi / plrust

A Rust procedural language handler for PostgreSQL
PostgreSQL License
1.12k stars 32 forks source link

JSonb extract slower in Pl/Rust than in Pl/PgSql #408

Closed ncharlot closed 7 months ago

ncharlot commented 7 months ago

Hi,

We give a try to Pl/Rust to boost a Pl/PgSql function that hashes a partial JSonb doc to a Md5/Uuid. The results are not so good and the first thing we noticed is a longer time to extract a value from a JSonb param.

This is the benchmark.

SQL version:

create or replace function get_jsonb_value_sql(doc jsonb, key text)
returns text strict immutable language sql
as
$$
   select doc->>key;
$$;

Pl/PgSql:

create or replace function get_jsonb_value_plpgsql(doc jsonb, key text)
returns text strict immutable language plpgsql
as
$$
begin
   return doc->>key;
end;
$$;

Pl/Rust:

create or replace function get_jsonb_value_plrust(doc jsonb, key text)
returns text strict parallel safe immutable language plrust
as
$$
   Ok(Some(doc.0[key].to_string()))
$$;

To generate a dataset and run benchmark:

drop table if exists data;
create temporary table data as
select
    jsonb_build_object(
        'foo', 'bar' || s.id,
        'lorem', 'ipsum'
    ) doc
from
    generate_series(1, 10000000) as s(id);

explain analyze select get_jsonb_value_sql(doc, 'foo') from data;
explain analyze select get_jsonb_value_plpgsql(doc, 'foo') from data;
explain analyze select get_jsonb_value_plrust(doc, 'foo') from data;

Results:

lang time
Sql 1863.657 ms
Pl/pgSql 4329.467 ms
Pl/Rust 7693.342 ms

The more the JSonb document is complex, the more the factor increases. Pl/Pgsql is quite stable and Pl/Rust takes more time. Maybe there's something we can do different?

Thanks.

workingjubilee commented 7 months ago

@ncharlot "fast JSONB in PL/Rust" is something that has been requested, but unfortunately, the current convenience wrapper around JsonB in pgrx requires quite a lot of copying and FFI overhead to operate and we don't do anything terribly clever in PL/Rust to avoid or minimize that. We just serialize the entire JsonB.

This, ironically, is about as fast as accessing the JsonBContainer using FFI functions, per https://github.com/tcdi/plrust/issues/396#issuecomment-1809082135

The way that PL/Rust has fast Array handling is via raw pointer access, so that the implementation can stay in Rust and thus benefit from compiler optimizations instead of using FFI functions, which have considerable overhead to call from Rust because Rust does not support setjmp and longjmp and thus we must shield against Postgres jumping over Rust code. It is hard to improve significantly here without doing that, but that requires more maintenance.

ncharlot commented 7 months ago

Thanks for your precise reply @workingjubilee

So for this first case we will start by passing directly the needed values of the Jsonb doc. And we'll follow the evolutions of this part!

eeeebbbbrrrr commented 7 months ago

pgrx' support for the JSONB type is very naive. It actually converts it to a String and then uses serde to deserialize it. So you'd see slightly better performance if you just used the JSON type. JSON already starts as a String (basically), so there's about 50% less overall work. I'm not sure how that equates to runtime performance but it should be faster.

I'd like to see us eventually have first-class support for JSONB. As @workingjubilee mentioned, we're able to do some nice/fast things for Array support b/c we have our own implementation of the binary format. We'd need to do the same for JSONB. It's doable, of course -- it's just a lot of work.