Open DvirDukhan opened 2 years ago
This issue is marked stale. It will be closed in 30 days if it is not updated.
@DvirDukhan decode_responses
is now set when initializing a client. Deciding which fields to decode feels more context dependent, it might depend on key, maybe in one key the same field is to be decoded and in another key not. Just throwing an idea: would a callback help? Not sure what arguments can the callback receive, I'm assuming we can always know at least the key being accessed. The callback would say if and what to decode from the response.
agree with @gerzse to selectively decode fields bound to a search index, knowledge of the search schema will be required which would be very intrusive/tightly-bound.
I think this request is valid to 3 types of response that returns a map type of response, and the data could be binary:
IMO, having a callback will add complexity as you offload a lot of responsibility to the end user, unless I'm missing something. I think that we can add additional parameter to the command where we can specify the binary fields. I assume the caller is aware on what they will get, so @bsbodden I don't think it is intrusive since it is not a single/nameless value. Thoughts @gerzse @bsbodden ?
I think I got it. The new parameter would act like an override of the decode_responses
that was set up when initializing the client, right? Maybe "override" is not the best word. Let's say it would qualify the decoding with details on what to decode and what not.
It would be more involved in the code, but that's fine. I think right now the decode_responses
is stored at objects creation, so it would need to be moved out as parameters to different methods. But that should be perfectly doable.
Since we are discussing decoding, should we factor in charsets, in case someone wants to decode in something different than UTF-8 for example?
@DvirDukhan I see your point now. Yeah, that makes sense. The callbacks will be too cumbersome IMO, I like the "fields to not decode param" approach
Something like the following would work? (decode_field
defaults to false, then it defaults to binary for embeddings)
q = Query("(*)=>[KNN 3 @content_embedding $B AS score]")\
.return_field("content_embedding", decode_field=False)\
.return_field("title", decode_field=True, encoding='utf8')\
.sort_by("score", asc=True)\
.dialect(2)
Something like the following would work? (
decode_field
defaults to false, then it defaults to binary for embeddings)q = Query("(*)=>[KNN 3 @content_embedding $B AS score]")\ .return_field("content_embedding", decode_field=False)\ .return_field("title", decode_field=True, encoding='utf8')\ .sort_by("score", asc=True)\ .dialect(2)
Usage-wise would defaulting to true
make more sense and make most query definitions simpler, e.g. only turn off decoding in special binary data only cases like vectors?
Makes sense. Apparently, everything is decoded regardless of decode_responses set at the connection level.
@gerzse You are right on the first part.
For the second part I think this is a different topic, since it can be applicable for every command :) But IMO, in order to be future proof you can pass an object decoding_context
that can hold whatever you want
r = Redis()
r.hget(...., decoding_context = {binary_fields = ['a', 'b', 'c'], decode_charset = ....})
I think we're mixing two separate use cases here. Search results decoding at the field level is one issue, and giving control over decoding per command is another.
Here is a fix for Vector fields https://github.com/redis/redis-py/pull/3309
Currently, Redis-py allows decoding of the entire response, or not decoding the entire response. In the situation where a hash/stream/document entries are mixed with human-readable and non human-readable data (blobs), the user must avoid using
decode_responses
, and decode the human-readable data manually. The feature request is to allow specifying field names which are avoided from being decoded whendecode_responses
isTrue