tcdi / plrust

A Rust procedural language handler for PostgreSQL
PostgreSQL License
1.12k stars 33 forks source link

Accessing JsonbContainer #396

Closed jscheid closed 1 year ago

jscheid commented 1 year ago

I'd like to access a JSONB value as a JsonbContainer, not a Jsonb. Is this possible?

jscheid commented 1 year ago

I was hoping I could work around it using anyelement and FromDatum, but sadly this gives me an error (using plrust-trusted-1.2.6_1.72.0-debian-pg16-amd64.deb):

cannot find type `AnyElement` in crate `pgrx`

EDIT: oh, I see, that's excluded from plrust-trusted-pgrx. There isn't currently any way to access the JSONB data directly in trusted mode, is there? For example, a way to access the contents of pg_sys::Datum without unsafe?

eeeebbbbrrrr commented 1 year ago

Correct, there is not. There's also not a way to do so with the "untrusted" variant of PL/Rust either. It still applies the full set of lints, which includes #[deny(unsafe_code)].

If this is the type of thing you find yourself needing to do, it probably makes sense to look at developing an actual extension (and hopefully with pgrx!).

jscheid commented 1 year ago

Thanks for your quick reply. Would it not be conceivable to expose JsonbContainer (and JsonbIteratorNext etc.), or a safe wrapper around these, in plrust?

eeeebbbbrrrr commented 1 year ago

Perhaps some day we'll gain that ability in pgrx and then can safely expose it here in pl/rust land too. I agree it'd be nice. The level of effort is quite high so it's not something I'd expect to come into existence tomorrow.

jscheid commented 1 year ago

Perhaps I could help bring it into existence. May I ask why you estimate the effort to be high, doesn't the JsonIterator family of functions already provide most of what we'd need?

eeeebbbbrrrr commented 1 year ago

I think the effort is high based on the effort required to safely (and correctly!) wrap Postgres' ARRAY type. Plus just years of doing this kind of thing with Postgres.

I'm not intimately familiar with the internals of the JSONB type, but I think it'll require a quite a bit more effort than ARRAY support did. Off the top of my head I don't know how much of the Postgres internal support functions we can lean on v/s writing our own.

@workingjubilee and I have briefly talked about this in the past, so they may have a better feel.

I'd like to encourage you to take a stab at this. We are here to help in any way that we can and would happily merge a working implementation into pgrx. At that point, it'd be easy to expose it in plrust.

jscheid commented 1 year ago

I might give it a try. Would you mind a few words to my question about JsonbIterator -- to me it looks like it would let us sidestep any questions about JSONB internals, but clearly I'm missing something. Is it not implemented, or not efficient?

eeeebbbbrrrr commented 1 year ago

I think anything from https://github.com/postgres/postgres/blob/7606175991f8ed5ae36eecf6951cb89dcfb2156e/src/include/utils/jsonb.h is going to require a lot of work to safely wrap with Rust. JsonbIterator is just one of many things that'll need to be handled. JEntry will require a lot of attention itself.

Like I said, I'm not sure exactly how much of the internal Postgres support functions we can use v/s having to essentially implement our own safe JSONB decoder in Rust.

jscheid commented 1 year ago

OK, makes sense. Thanks so far, I'll poke around a bit.

workingjubilee commented 1 year ago

I might give it a try. Would you mind a few words to my question about JsonbIterator -- to me it looks like it would let us sidestep any questions about JSONB internals, but clearly I'm missing something. Is it not implemented, or not efficient?

In general, it is safe to assume it is not very efficient to call a C API during iteration in Rust. Iteration is always "hot", and modern C compilers often benefit greatly by exploiting inlining. So does Rust, but Rust code struggles to inline C unless certain very specific invariants are met, starting with "The Rust you use was built against the same LLVM as the clang you used to compile Postgres... you did use clang, not gcc, right?" For this and more reasons, historically, PGRX's use of C APIs that iterate series-of-varlenas reprs has not been very fast.

But what is most problematic is that Rust code must become effectively bulletproof in a way that C code helps with less often than I like: virtually no C code is expected to be incapable of memory unsoundness, whereas 100% of safe Rust is expected to be so... assuming the underlying unsafe Rust is correctly written. This affects how the final safe Rust layer that wraps unsafe { .. } should be designed, so what is a sound and good design when wrapping C FFI might become less desirable if the layout is instead handled directly in Rust.

jscheid commented 1 year ago

@workingjubilee thanks for your thoughtful comment. I hear you about performance and safety, but supporting JSONB natively should be a last resort from my perspective. There's a lot of work waiting there, and messages like this one read like the Postgres team (understandably) isn't interested in making any commitments regarding stability of the internal representation across versions, so it would potentially be a moving target. That's why I would like to explore the iteration route first.

Here is a proof of concept. Let me know what you think and don't be too harsh, I've only spent a few hours on it and it's obviously not meant to be polished, much less ready for anything. In fact I haven't spent any time reasoning about safety yet and you'll probably find a gaping memory leak immediately if you care to look, or worse. The API isn't meant to be set in stone either. Still, I hope it can serve as a basis for further discussion.

~This is currently a smidge slower than existing JSONB->JSON->Serde, which is disappointing but I also haven't looked into optimizing it yet. It's entirely possible that the iterator itself is a bottleneck as suggested by you @workingjubilee. However,~ It's a bit faster than existing JSONB->JSON->Serde, I was hoping for more but still a small improvement. Also, it should avoid the extra allocation that the existing conversion incurs and also allow skipping over uninteresting parts of the document. (You'll ~probably~ crash if you try, though, since support for Binary isn't there yet.)~

~(It currently iterates over all the leafs. It can probably be sped up easily by reading whole arrays and objects, but then we're back to a tradeoff with memory overhead. I will have to profile this all more, and read Postgres code, to better understand what the implications are.)~ It doesn't seem that reading whole arrays or objects is possible with the iterator API.

Feel free to move this conversation over to the pgrx repo.

workingjubilee commented 1 year ago

I can't "just" move issues between organizations, unfortunately.

It's usually easier to hold extended conversations with lots of small possible digressions on chat-centric platforms.