redis / hiredis-py

Python wrapper for hiredis
MIT License
497 stars 104 forks source link

PUSH type not distinguishable from ARRAY type in RESP3 #128

Open stereobutter opened 2 years ago

stereobutter commented 2 years ago

RESP3 introduced the PUSH type for transmitting out of band data to a client (such as pubsub messages, client tracking invalidation events etc.). Except for being its own type (identified by the > symbol) PUSH is the same as ARRAY.

Currently using hiredis-py parsing a PUSH message returns a list, same as parsing an ARRAY.

example = (
    ">4\r\n"  # push array with 4 elements
    "+pubsub\r\n" 
    "+message\r\n"
    "+somechannel\r\n"
    "+this is the message\r\n"
)

reader.feed(example)
data = reader.gets()
assert type(data) is list

This makes it impossible for a client to differentiate PUSH and ARRAY messages. As PUSH messages may be interleaved with regular replies (such as when using CLIENT TRACKING on the same connection) it is important for a client to be able and differentiate them.

It appears upstream hiredis already supports the PUSH type

For hiredis-py PUSH messages clould use a subclass of list i.e. something like type('push', (list,), {}).

chayim commented 1 year ago

@dvora-h interesting, for when we work on RESP3

chayim commented 1 year ago

@dvora-h What do your redis-py tests show here? Does the PubSub push test in the 5.0 work in this scenario?

chayim commented 1 year ago

@stereobutter Can you help me understand why you believe that list is more appropriate here? Given what I read in the specification, I'd have yielded a string, to handle:

Out of band data. The format is like the Array type, but the client should just check the first string element, stating the type of the out of band data, a call a callback if there is one registered for this specific type of push information. Push types are not related to replies, since they are information that the server may push at any time in the connection, so the client should keep reading if it is reading the reply of a command.

I read that as ideally it should be an array, but YMMV since you have to hit up a callback anyway. Again a List seems right.

stereobutter commented 1 year ago

@chayim I am not sure I understand your question, so please excuse if my response doesn't make sense.

As I understand it the PUSH type in RESP3 was introduced so that clients may use pub/sub on the same connection as regular commands. In RESP2 pubsub messages are array messages and thus a client could not differentiate it from a regular array reply it may be expecting (e.g. after issuing a command like HMGETALL).

In RESP3 the array and push types only differ by the first character (> for push messages and * for arrays) and as such it would make sense to use a similar (but different) type on the python side. I don't see how str would work here as a push type can have an arbitrary number of items? You are correct of course that a client would use the first item of a push message for figuring out what type of push message it is and what callback to trigger.

The spec says

RESP3 push data is represented from the point of view of the protocol exactly like the Array type. However the first byte is > instead of *, and the first element of the array is always a String item, representing the kind of push data the server is sending to the client. All the other fields in the push array are type dependent, which means that depending on the type string as first argument, the remaining items will be interpreted following different conventions.