kitsonk / kv-toolbox

Utilities for working with Deno KV 🦕🗝️
https://kview.deno.dev/kv-toolbox
MIT License
66 stars 5 forks source link

Possibly halve the number of writes for small blobs #10

Open inverted-capital opened 6 months ago

inverted-capital commented 6 months ago

For small blobs, ie: those that are under the 64kB limit for DenoKV, using the blob library results in two writes - a meta and a blob write.

I wonder if these couldn't be combined into a single write ?

The format of the meta write is roughly:

  const key: [
    ...userSuppliedKey,
    "__kv_toolbox_meta__"
  ],
  const value: { kind: "buffer", size: 92 }

However if this was always encoded into a Uint8Array, then the first blob write could be appended to the meta value using length encoding or something, resulting in a single write for small values, and decreasing the total write count by 1 for all other sizes ?

kitsonk commented 6 months ago

That really gets complicated supporting all of the features that are there as well as managing the logic around being able to detect and "view" the blob without decoding it. Embedding the meta data in the first "chunk" also makes it more complicated. Each of the values of the "parts" of the blob are binary chunks of the blob itself. This means these can be read (and streamed) without understanding the contents of the part and even decoding it.

Grabbing just the meta value works really well when you want to represent the value without touching the values of the blob itself, which is exactly what kview does when sending it over the wire. It uses the getMeta() to represent the raw binary data or Blob or File without actually having to read any of the blob itself. The writes are part of a single atomic commit internally, so the overhead is actually quite minimal.

I am afraid it isn't worth the added complexity and the downside of having to always fully decode the first chunk to just understand what type of blob it is.

inverted-capital commented 6 months ago

Ok I understand - thanks for the detailed explanation.

My use case is that when I know a file is small I use a non blob method so I can get a faster read, otherwise my lookup is two round trips not one. This is hard to consume tho since I have to know in advance when reading which method was used to store. I understand what you mean about the writes being atomic so the overhead being negligible, but reads appear not as immune.

When my isolate is in australia, RTT to the db is 400ms each time, so 800ms to get the meta and then get the blob - I was hoping to do that in one shot.

A suggestion - you could maybe speed up the reads if you read both the meta and the first blob at once using getMany() and then use the list() function to get blobs two and above if they even exist ?

https://github.com/kitsonk/kv-toolbox/blob/b18ec53b1a2bd06202be0d994c1c3213a78b7cff/blob.ts#L238-L240

Parallel reads are about the same speed as a single one when your database is half the world away...

kitsonk commented 6 months ago

I am still thinking about what the right approach for this. 🤔