pgcentralfoundation / pgrx

Build Postgres Extensions with Rust!
Other
3.7k stars 249 forks source link

RFC: Zero copy JSONB support #501

Open Hoverbear opened 2 years ago

Hoverbear commented 2 years ago

We are exploring ways to support JSONB in a zero copy way. Right now JSONB maps to pgx::Json(serde_json::Value), but we'd like to be able to use this data without necessarily creating a copy.

@licenser's simd-json crate supports a BorrowedValue API which may fit our needs.

serde also can support zero copy deserialization.

Licenser commented 2 years ago

If I can help with simd-json feel free to ping :)

Hoverbear commented 2 years ago

We were discussing last day and @eeeebbbbrrrr corrected me that JSONB is not quite "a Vec<u8> of JSON", reading a bit in https://github.com/postgres/postgres/blob/0bd7af082ace135581bb13a6bd2d88e68c66a3e0/src/include/utils/jsonb.h#L81-L116, I worry we may not be able to do zero-copy with it without there being some postgres APIs for it...

eeeebbbbrrrr commented 2 years ago

If we're to do this, I think we're gonna need to basically reimplement jsonb in rust. Assuming we can use serde to do both serialization and deserialization, perhaps we publish it as a separate crate. It's a decent format that the broader rust community might find useful.

As priorities go for all the things we're doing, this issue is pretty low, so we got time to noodle things more.

bitner commented 1 month ago

https://github.com/datafuselabs/jsonb Looks like a pretty complete reimplementation of jsonb in rust. I'm currently looking at doing some heavier work with JSONB fields in pgrx that I suspect could be greatly enhanced by direct parsing of jsonb rather than jsonb->json-> Value.

I'm willing to take on some work to try to make this work, but would likely need some guidance. Any thoughts on the jsonb library and whether that would be suitable to work in this context?

eeeebbbbrrrr commented 1 month ago

Neat.

I think we’d entertain using that crate if it actually works.

I suspect the hard part here will be managing lifetimes behind the raw Datum pointer. I think we’d want to take advantage of the Cow<‘a, &str> if we can — otherwise there’s little point.