Closed gsramdasi closed 2 years ago
This is currently planned for 2.1.0 I believe. We will be adding a cursor create flag for it.
This could also be a cursor read flag so we skip values only for specific reads
Also, this RFC should explore another option with cursors - A way for the application to pass in a buffer for keys and/or values. i.e. the application owns the buffers.
Wait, I didn't even realize this was Gaurav :).
There was a user request for a way to read just the keys and not values using cursors (only-keys cursor reads). A related feature we’ve considered adding is to allow the caller to pass in buffers for keys and/or values (user-buffer cursor reads). The user-buffer cursor reads should also have a way to read just keys and not values.
So we will have 2 variants:
Similarity with point gets: The copyout variant is very similar to a hse_kvs_get() call where the caller passes in a buffer for the value. In the get call if the value buffer is NULL, the get turns into a probe operation. i.e. it doesn’t return the value, but whether the value was found or not and the value length. We could do something similar in the copyout cursor read when we just need to read the keys i.e. set the value buffer to NULL. This does seem more natural than using a flag like in Option 1. However, we cannot do this for the default read variant and it would need to use a flag. Now the two cursor read variants are inconsistent. Probably just something worth considering.
Since this is an API change, I wanted to get everyone’s opinion on how the APIs should be updated. I talked to @alexttx and here are two options we thought would make sense. Please let me know what you think and feel free to suggest a third option.
A new cursor read API is added that accepts the caller’s buffers. For both APIs the caller can pass in a flag so we read just the key.
#define HSE_CURSOR_READ_NOVALUE 1 // Flag passed to both variants
// Unchanged
hse_err_t
hse_kvs_cursor_read(
struct hse_kvs_cursor *cursor,
unsigned int flags,
const void ** key,
size_t * key_len,
const void ** val,
size_t * val_len,
bool * eof);
// The copyout variant
hse_err_t
hse_kvs_cursor_read_copy(
struct hse_kvs_cursor *cursor,
unsigned int flags,
const void * key,
size_t * key_len,
size_t key_buf_size,
const void * val,
size_t * val_len,
size_t val_buf_size,
bool * eof);
A separate entry point for reading just the keys for both variants instead of using a flag.
// Unchanged
hse_err_t
hse_kvs_cursor_read(
struct hse_kvs_cursor *cursor,
unsigned int flags,
const void ** key,
size_t * key_len,
const void ** val,
size_t * val_len,
bool * eof);
// Read only keys. HSE owns the buffers.
hse_err_t
hse_kvs_cursor_readkey(
struct hse_kvs_cursor *cursor,
unsigned int flags,
const void ** key,
size_t * key_len,
bool * eof);
// Copyout variant. Read key and value.
hse_err_t
hse_kvs_cursor_read_copy(
struct hse_kvs_cursor *cursor,
unsigned int flags,
const void * key,
size_t * key_len,
size_t key_buf_size,
const void * val,
size_t * val_len,
size_t val_buf_size,
bool * eof);
// Copyout variant. Read just the key.
hse_err_t
hse_kvs_cursor_readkey_copy(
struct hse_kvs_cursor *cursor,
unsigned int flags,
const void * key,
size_t * key_len,
size_t key_buf_size,
bool * eof);
Personally, I like option one much better. I think I like HSE_CURSOR_READ_KEY_ONLY
instead of HSE_CURSOR_READ_NOVALUE
.
Just spitballing an idea. Is there a way to make the read_copy() API the goto API for cursor reads where we could mark the original read() API as deprecated?
Potentially a better name than read_copy()? Maybe people know from other APIs they have used? Actual API content looks fine to me, although I might put key_buf_sz before key_len just to tie it to the buffer better. Same for val_buf_sz.
Not sure how this would look at hse-python level. I have an idea, but not important for now.
Is there any reason the caller can't just pass in NULL for val in the existing API to effect a key read?
The val in the existing API is a double pointer and it's set by hse to point to the region of memory that holds the value - i.e. it's an output. If we interpret val=NULL as read-only-key, the caller will have to explicitly set it to some dummy non-NULL value when it does want to read the value.
I don't understand.. In the latter case it's never NULL and must be the address of a pointer in which hse will stash the ptr to the internal buffer. What am I missing? I'm not saying key on *val being NULL, I'm saying pass in NULL for val. In case I wasn't very clear...
Yes, of course. That makes sense. So for either API, if val is set to NULL, we'd return just the key. No need for the flag.
// unchanged
hse_err_t
hse_kvs_cursor_read(
struct hse_kvs_cursor *cursor,
unsigned int flags,
const void ** key,
size_t * key_len,
const void ** val,
size_t * val_len,
bool * eof);
// The copyout variant
hse_err_t
hse_kvs_cursor_read_copy(
struct hse_kvs_cursor *cursor,
unsigned int flags,
const void * key,
size_t * key_len,
size_t key_buf_size,
const void * val,
size_t * val_len,
size_t val_buf_size,
bool * eof);
I think the read and read_copy without flags makes the most sense.
I don't care for the "read_copy" function. I'm thinking something along the lines of hse_kvs_cursor_readv() but give me some time to flesch it out.. Something like:
hse_err_t hse_kvs_cursoror_readv( struct hse_kvs_cursor cursor, uint flags, struct iovec iov, int iovcnt, bool *eof);
where iovec[0] describes a buffer into which to read the key, and iovec[1] describe a buffer into which to read the value. If iovcnt is 1 then it only reads the key. Either way, both only the length specified by the iovec is filled by the key and/or value.
What is struct iovec? I see. I like the idea Greg, but it seems weird to have a Linux-specific public API. Is there an equivalent for other systems?
Do callers really need iovec support? Let's not use iovecs just because you don't like the name read_copy. None of our APIs use iovecs, so this would stand out like a sore thumb.
IMO KISS favors read and read_copy as Gaurav spelled out.
Can we get rid of key_buf_size and val_buf_size and make key_len and val_len value-result parameters? Also, I think we need to lose the const from the read-copy() calls.. I really hate read-copy... readx(), read2(), readbuf() ??? The "read-copy()" form should also support valbuf=NULL, that way I can easily read all the keys directly out of hse and into my own buffer, thereby avoiding a buffer copy.
Is there any inspiration from RocksDB which the API could pull from?
"_read_copy" make sense to me since it does an actual copy vs the "_read" version which does not.
Thanks everyone. I've posted a PR for this change. See https://github.com/hse-project/hse/pull/105
@gsramdasi, do you plan to turn this into an RFC? That's probably overkill, and you've already merged the PR.
Currently an application can use the cursor API to read through keys in a range. But the cursor also reads values. If the application only wants to read keys it would be useful to have a way to iterate through just the keys which would be more performant and may also avoid polluting the page cache.