scylladb / scylla-rust-driver

Async CQL driver for Rust, optimized for ScyllaDB!
Apache License 2.0
560 stars 97 forks source link

Make paging state strongly typed #987

Closed wprzytula closed 5 days ago

wprzytula commented 4 months ago

Currently, paging state is represented both in the driver's internals and in the public API as Bytes blob. As it's no use allowing the users to construct paging state on their own, I suggest creating an opaque newtype for it, with a private constructor.

Additionally, the core benefit that Bytes bring - being able to subslice a bigger allocation - does not apply in the case of paging state. I believe that paging state could be better represented as Arc<[u8]>.

Lorak-mmk commented 4 months ago

As it's no use allowing the users to construct paging state on their own, I suggest creating an opaque newtype for it, with a private constructor.

They may want to serialize paging state and deserialize it later

wprzytula commented 4 months ago

As it's no use allowing the users to construct paging state on their own, I suggest creating an opaque newtype for it, with a private constructor.

They may want to serialize paging state and deserialize it later

Can't this be said about any struct that we intentionally restrict access to?

Lorak-mmk commented 4 months ago

No, this is actually existing use case, that's how you can implement paging e.g. in your web service. Even Java Driver docs mention this use case: https://java-driver.docs.scylladb.com/scylla-3.11.2.x/manual/paging/index.html It's hardly comparable to serializing any other random struct.

wprzytula commented 4 months ago

Then it would make sense to expose constructor (from_raw) and accessor (into_raw/into_inner).

Lorak-mmk commented 4 months ago

What is the problem with current version (just using Bytes)? Custom struct will force users to create their own newtype wrapper to be able to serialize it, that's 2 more levels of abstraction for which I don't see a compelling benefit

wprzytula commented 4 months ago

What is the problem with current version (just using Bytes)?

The answer is:

Additionally, the core benefit that Bytes bring - being able to subslice a bigger allocation - does not apply in the case of paging state. I believe that paging state could be better represented as Arc<[u8]>.

When I see Bytes, I expect some shared ownership of the whole frame. This is misleading, because paging state's Bytes are currently created from an exclusively owned buffer that is distinct from the frame Bytes.

wprzytula commented 4 months ago

Custom struct will force users to create their own newtype wrapper to be able to serialize it

Not if we provide the two functions I mentioned:

Then it would make sense to expose constructor (from_raw) and accessor (into_raw/into_inner).

Lorak-mmk commented 4 months ago

Custom struct will force users to create their own newtype wrapper to be able to serialize it

Not if we provide the two functions I mentioned:

Yes, even then, because people use frameworks for serialization (serde, rkyv etc)

Then it would make sense to expose constructor (from_raw) and accessor (into_raw/into_inner).

Lorak-mmk commented 4 months ago

What is the problem with current version (just using Bytes)?

The answer is:

Additionally, the core benefit that Bytes bring - being able to subslice a bigger allocation - does not apply in the case of paging state. I believe that paging state could be better represented as Arc<[u8]>.

When I see Bytes, I expect some shared ownership of the whole frame. This is misleading, because paging state's Bytes are currently created from an exclusively owned buffer that is distinct from the frame Bytes.

Now I don't understand at all. You don't want Bytes because it implies shared ownership (it doesn't, can be used here without any problem), and you propose Arc<[u8]> which is definitely a shared-ownership. I'm lost.

wprzytula commented 4 months ago

I expect some shared ownership of the whole frame.

Perhaps I should have made this bold:

I expect some shared ownership of the whole frame.

Bytes enable subslicing while retaining shared ownership; instead, when you have an Arc<[u8]> in hand, you know the exact size of the allocation.

This makes huge difference when you want to have control over memory allocated:

Such precaution (about using Bytes carefully` in deserialization) was included in this the deserialization framework refactor by @piodul.

Lorak-mmk commented 4 months ago

User of the driver doesn't care about the frame and has no reason to think about it in terms of result frame. This is something only maintainer of this driver would do.

wprzytula commented 4 months ago

As discussed face to face, deserialization refactor is going to enable the user deserialize frame subslices into Bytes, so the user will have to care about the result frame. I, as the maintainer, get confused about Bytes being used there.