square / okio

A modern I/O library for Android, Java, and Kotlin Multiplatform.
https://square.github.io/okio/
Apache License 2.0
8.81k stars 1.18k forks source link

zero-copy ByteArray to Buffer for reading purpose #1485

Open Chuckame opened 5 months ago

Chuckame commented 5 months ago

Context

I'm currently the maintainer of avro4k and I'm planning to use okio for kicking out java streams and hoping a day to be multiplatform.

A lot of apps/libs/frameworks are dealing only with ByteArray (I'm not saying it's a good idea though). On our side, in the avro world, and especially in the messaging world (kafka, rabbitmq, ...) everything is using a ByteArray and we have no room for improvement to use ByteBuffer or even okio's Buffer.

We can encode easily data to a Buffer then reading the content to a ByteArray.

But for decoding from a ByteArray, with okio, we only have to choice to first copy the content to a Buffer and then decode, that is really bad regarding performances.

By the way, we are not using directly Buffer but BufferedSink and BufferedSource for this really great encoding/decoding API, but sadly those interfaces are sealed.

Proposal

A constructor of BufferedSource that takes a ByteArray to allow reading "complex" values (readLongLe, readUtf8, ...) over a ByteArray

Non goal

Backing a Buffer with a ByteArray : https://github.com/square/okio/issues/1360

swankjesse commented 5 months ago

Could you use UnsafeCursor? https://square.github.io/okio/3.x/okio/okio/okio/-buffer/-unsafe-cursor/index.html

Chuckame commented 5 months ago

How to wrap the array using unsafe cursor? I only see a transfer method to still write bytes to the buffer.

Maybe by setting the bytes to data?

swankjesse commented 5 months ago

You’d use this API to get a ByteArray that you can write bytes into. (That is only useful if the APIs you’re interacting with let you provide the target byte arrays.)

Chuckame commented 5 months ago

Ah ok, this is already ok for the encoding part.

I'm mainly talking about decoding data from a ByteArray using BufferedSink without copying the ByteArray

Chuckame commented 5 months ago

I would like something like BufferedSink.wrap(bytes)

Chuckame commented 4 months ago

@swankjesse do you have a solution for reading from a ByteArray without copying ?

JakeWharton commented 1 month ago

The problem is the the "Buffered" in BufferedSink comes from its use of Buffer to implement intermediate storage for higher-level APIs than a raw Sink. And that Buffer is exposed in the API, so you can't "just" wrap a ByteArray with an index pointer or something.

Now looking at Buffer, its backing ByteArrays are held in Segments. We could probably implement something like Buffer.unsafeCreateFromByteArray (or maybe on UnsafeCursor so it's not on Buffer) which took a ByteArray and created a single Segment with it whose shared was set to true to prevent it from going into the pool and owner set to false to prevent writing to it.

So it seems technically possible, and has the same relative guarantees as UnsafeCursor usage. On the other hand, there's not a very high demand for this.