redis / redis-py

Redis Python client
MIT License
12.53k stars 2.5k forks source link

Feature request - conditional decode of response #2164

Open DvirDukhan opened 2 years ago

DvirDukhan commented 2 years ago

Currently, Redis-py allows decoding of the entire response, or not decoding the entire response. In the situation where a hash/stream/document entries are mixed with human-readable and non human-readable data (blobs), the user must avoid using decode_responses, and decode the human-readable data manually. The feature request is to allow specifying field names which are avoided from being decoded when decode_responses is True

github-actions[bot] commented 1 year ago

This issue is marked stale. It will be closed in 30 days if it is not updated.

gerzse commented 3 months ago

@DvirDukhan decode_responses is now set when initializing a client. Deciding which fields to decode feels more context dependent, it might depend on key, maybe in one key the same field is to be decoded and in another key not. Just throwing an idea: would a callback help? Not sure what arguments can the callback receive, I'm assuming we can always know at least the key being accessed. The callback would say if and what to decode from the response.

bsbodden commented 3 months ago

agree with @gerzse to selectively decode fields bound to a search index, knowledge of the search schema will be required which would be very intrusive/tightly-bound.

DvirDukhan commented 3 months ago

I think this request is valid to 3 types of response that returns a map type of response, and the data could be binary:

  1. Reading a hash e.g. HGET, HMGET etc.. (JSON is irrelevant as it is cannot hold binary data)
  2. Reading a stream: XREAD
  3. Reading a search response: FT.SEARCH, FT.AGGREGATE

IMO, having a callback will add complexity as you offload a lot of responsibility to the end user, unless I'm missing something. I think that we can add additional parameter to the command where we can specify the binary fields. I assume the caller is aware on what they will get, so @bsbodden I don't think it is intrusive since it is not a single/nameless value. Thoughts @gerzse @bsbodden ?

gerzse commented 3 months ago

I think I got it. The new parameter would act like an override of the decode_responses that was set up when initializing the client, right? Maybe "override" is not the best word. Let's say it would qualify the decoding with details on what to decode and what not.

It would be more involved in the code, but that's fine. I think right now the decode_responses is stored at objects creation, so it would need to be moved out as parameters to different methods. But that should be perfectly doable.

Since we are discussing decoding, should we factor in charsets, in case someone wants to decode in something different than UTF-8 for example?

bsbodden commented 3 months ago

@DvirDukhan I see your point now. Yeah, that makes sense. The callbacks will be too cumbersome IMO, I like the "fields to not decode param" approach

mortensi commented 3 months ago

Something like the following would work? (decode_field defaults to false, then it defaults to binary for embeddings)

    q = Query("(*)=>[KNN 3 @content_embedding $B AS score]")\
        .return_field("content_embedding", decode_field=False)\
        .return_field("title", decode_field=True, encoding='utf8')\
        .sort_by("score", asc=True)\
        .dialect(2)
bsbodden commented 3 months ago

Something like the following would work? (decode_field defaults to false, then it defaults to binary for embeddings)

    q = Query("(*)=>[KNN 3 @content_embedding $B AS score]")\
        .return_field("content_embedding", decode_field=False)\
        .return_field("title", decode_field=True, encoding='utf8')\
        .sort_by("score", asc=True)\
        .dialect(2)

Usage-wise would defaulting to true make more sense and make most query definitions simpler, e.g. only turn off decoding in special binary data only cases like vectors?

mortensi commented 3 months ago

Makes sense. Apparently, everything is decoded regardless of decode_responses set at the connection level.

DvirDukhan commented 3 months ago

@gerzse You are right on the first part.

For the second part I think this is a different topic, since it can be applicable for every command :) But IMO, in order to be future proof you can pass an object decoding_context that can hold whatever you want

r = Redis()
r.hget(...., decoding_context = {binary_fields = ['a', 'b', 'c'], decode_charset = ....})
uglide commented 1 month ago

I think we're mixing two separate use cases here. Search results decoding at the field level is one issue, and giving control over decoding per command is another.

Here is a fix for Vector fields https://github.com/redis/redis-py/pull/3309