Open dpipemazo opened 6 years ago
We should only use one queue and xread from one stream (queue). If high priority message is needed, a high priority stream (queue) is then setup and is xread more often. See https://youtu.be/F5Ri_HhziI0
Hello @dpipemazo, I think that your issue after all is: what is the scheduling when returning data from multiple streams? That's a very good question indeed. I'll check the implementation and reply what is the current behavior, and we can decide if we want:
For instance it could make sense when there are multiple streams to report data from the one with the oldest entries that can be reported, or make sure to report an even number of items per stream, or something like that. I think that in the current implementation, the order in which the streams are reported in the command line can make a big difference, and this is indeed a bit odd. I'll check the implementation in order to be sure to report correct information. Thanks.
@yoonghm your solution is sounding, but perhaps we can improve what Redis is doing. Not 100% sure this is viable, because for sure we want to reply ASAP once a single stream has more data, so there is to understand if keeping this behavior we can still improve it, I'll report back.
@dpipemazo I think there is a problem indeed, and what you observed is exactly the behavior originating from such problem. This is what happens (however note that I'm not able to explain a specific detail):
handleClientsBlockedOnKeys()
will iterate the list of ready keys, however the list will have very likely A before B in the list. So the client is unblocked on "A", and that's it. The ready key "B" gets discarded, and will only be served once it will be the first in ready keys.SO... normally under non simulated workloads, A could be before B or the reverse, and the function should be more fair. In a loop writing to A and B with pipelining, A will always be before B. In your case, in theory, we should have just A and then just B at every cycle, so this is odd!
However, for the problem I identified there is a very simple solution we could apply here:
Function signalKeyAsReady()
, we have:
listAddNodeTail(server.ready_keys,rl);
What we can do instead is to add the item on the head or tail at random. This is probably not enough if there are multiple keys: if we have 10 keys, the first key will anyway never be the first of the list, so we need to perform a bit more convoluted operation on the list to make sure the list at the end is shuffled, but should not be hard, probably after adding the item at random we just need to rotate the list in one or the other side randomly, but I've to check the theoretical properties of such algorithm.
The big question is, why are you observing that behavior even without pipelining? The only way I can reply to such question is triggering the same behavior you have locally I guess.
IMPORTANT DETAIL: all this also applies to blocking lists AFAIK, a feature we have for years, but yet nobody noticed apparently, because what I guess is that under normal workloads keys get added in random order.
I'm pinging @oranagra, @yossigo, @itamarhaber and @soloestoy here in case they can report other information of experiences from customers of managed Redis instances that ran into such problems.
Just as a general FYI: in Go, when using the select
statement over multiple Go channels, if more than one channel is ready to be consumed, the statement chooses randomly from which channel to read from.
This is how Go achieves some degree of fairness in a situation that seems related to the present issue. Relevant documentation link: https://golang.org/ref/spec#Select_statements
@kristoff-it yes the way to fix it is simple, the problem of this issue is that my hypothesis of what happens, that is easily fixed with randomization, does not explain very well what the OP stated, since the OP went in great details to explain in the first instance that no pipelining is used. I'm trying to reproduce.
Btw I can reproduce the issue with @dpipemazo program (after a small change, since he used very early versions of Stream code, and the test program is now incompatible with the format generated by Redis). The favored stream is consistently the first one that gets the XADD.
Ok, I finally figured everything. It was quite convoluted and non obvious, so it took some time, and the OP @dpipemazo was right that this is specific to using $
.
This is what happens: in a tight loop like the one in the benchmark, that is also written in C so has no obvious delays, and not using any pipelining, the two threads sometimes start to synchronize like that:
READING CLIENT: XREAD ... (blocks since there is no data to consume)
WRITING CLIENT: XADD stream1 ...
READING CLIENT: (unblocks since now there is data in stream1)
--- next event loop ---
READING CLIENT: (receives data that was sent in the buffers in the previous event loop)
WRITING CLIENT: XADD stream2 ... (but nobody is blocked now, so XREAD will not receive that)
--- next event loop (again as the first) ---
READING CLIENT: XREAD ... (blocks since there is no data to consume)
WRITING CLIENT: XADD stream1 ...
READING CLIENT: (unblocks since now there is data in stream1)
And so forth. Normally this kind of synchronization does not happen with, for instance, blocking lists, because once one of the lists has some data, it will be served synchronously, without blocking for the other list. So if list2 accumulated data and got unnoticed, at the next call of BRPOP or whatever, it will be served immediately out of list2.
But with $
this does not happen, because the first time we use $
there is never data to return, so when the two clients synchronize as I shown above, the commands are like that:
XREAD BLOCK 10000 STREAMS STREAM1 stream2 1567704834178-3 $
XREAD BLOCK 10000 STREAMS STREAM1 stream2 1567704834178-4 $
XREAD BLOCK 10000 STREAMS STREAM1 stream2 1567704834179-0 $
XREAD BLOCK 10000 STREAMS STREAM1 stream2 1567704834179-1 $
XREAD BLOCK 10000 STREAMS STREAM1 stream2 1567704834179-2 $
... and so forth ...
at some point the two clients go out of sync, the second stream gets an ID as well instead of $
, and finally things work as expected from now on.
I'm not sure what I should do about that, it's kinda part of the semantics, and non trivial to fix AFAIK. It is not easy that this happens in real world conditions, however probably it's a good idea to at least advice in the documentation that to start waiting for multiple streams with $
may create an uneven condition, and may be better to fetch the last ID of the streams, and start with the IDs directly.
However I'll try to find a solution: tomorrow will work a bit more at this issue. Thanks. Unfortunately I guess at at this point @dpipemazo is no longer interested since the issue is one year old. Fortunately @yoonghm commented the issue bringing it up again.
P.S. the Right Thing to do would be to change the internals in order to serve the same blocked client, that is blocked for multiple keys, with all the keys it is blocking for. Not sure if I'll be brave enough to do that in order to fix this, but it is important to note what would be the best. However in order to do that, the code should be super clean and wonderful, because we are already at the limits of the complexity with the current code.
Hi, I just wanted to comment on this. I think it's probably a feature, not an issue. If I understood correctly how we should use XREAD with $, after the first call, it's better to send the same returned ID for both streams as it's documented: "It is very important to understand that you should use the $ ID only for the first call to XREAD. Later the ID should be the one of the last reported item in the stream, otherwise you could miss all the entries that are added in between." Of course, this is only true if XADD key * is used.
@tuxArg yes but that's not the point of this issue. The problem is, when you wait for multiple streams, if you are able to get (following the pattern you just cited) data from both the streams in a fair way.
If I understand correctly, asking for `$' more than once can make you miss some entries, when XREADing more than one stream. Even if it's not the focus of this issue I think it should be addressed in the docs at the very least, I don't think this is expected nor desired behavior.
If part of the problem lies partially with the "ambiguous" meaning of asking for '$' more than once, how hard would it be to let clients avoid that ambiguity by explicitly giving them in the reply what the next set of IDs should be?
> XREAD BLOCK 10000 STREAMS key1 key2 $ $
1) 1) 1567704834179-1
2) 123-123
2) <item from key1>
Once this is in place, then yes, it's a matter of making the delivery more fair, but I think it would be weird to try making it more fair without first giving a way to clients to be precise about what they want.
@antirez are you thinking of making it perfectly fair (or close to)? I think an argument could be made in favor of just ensuring that no stream gets completely choked, especially if that's simpler to implement.
@kristoff-it yes even something that is able to mitigate it is enough, after all with real world traffic this is very unlikely to happen, and after the very first calls usually things desynchronize and everything works as expected. About changing the protocol now, not a good idea, we need to be backward compatible, and anyway if you want to modify clients, it is simpler to just fetch the IDs with XINFO and directly start with the IDs instead of $
I guess. However one thing that I'm exploring is to modify the code (I already refactored it in a092f20d) in order to serve the blocked streams in parallel. This is an improvement but I'm not sure it fixes every problem.
There is another thing to note, there are use cases for which you want to really send always $
, since you want to do sampling of new items to have monitoring processes and so forth. The problem with the pattern above is that, actually, repeating $
is not intentional at all... but a side-product of the interaction of a few things.
Btw about serving multiple keys to the same blocked client in parallel: as long as both (or N) streams received data, then all will be populated with their IDs, and things will continue to work as expected. However if the XADDs are sent in an unfair way, like there are writes in the first key, and later the app start writing to all the other keys, we get trapped in the same problem again, so yesterday I was wrong, this is an improvement but not the ultimate solution for this problem.
The problem with the pattern above is that, actually, repeating $ is not intentional at all... but a side-product of the interaction of a few things.
Agreed, and I think that this "unintentionality", while generally not a problem (as you said, precise conditions are required for this problem to surface), is still going to be the most common scenario, where the developer just wants to tail a bunch of streams and does it in the most straightforward way:
# redis-py example
streams = {
'key1': '$',
'key2': '$',
'key3': '$',
}
while True:
for stream_id, items in redis.xread(streams, block=0):
# update the cursor
streams[stream_id] = items[-1][0]
# actual processing
for item in items:
process(item)
Being able to "hot-plug" to a stream using '$' is great, but the problem is that this pattern also causes problems when (unintentionally), a '$' survives a delivery loop.
About changing the protocol now, not a good idea, we need to be backward compatible, and anyway if you want to modify clients, it is simpler to just fetch the IDs with XINFO and directly start with the IDs instead of $ I guess.
Sure changing the protocol is a big deal, but my argument is about preserving as much as possible the nice pattern that '$' enables, while limiting the problem at the core of this issue. Doing XINFO first would work, but people are lazy, expect the most straightforward approach to work, and Redis builds a lot of expectation in that regard :)
If XREAD were to return also ids for all streams (regardless of which results get returned), then the pattern would almost look the same:
# redis-py example
streams = {
'key1': '$',
'key2': '$',
'key3': '$',
}
while True:
next_ids, results = redis.xread(streams, block=0)
# update the cursor (python3.7+ dicts are ordered, older pythons have OrderedDict)
streams = {k: i for k, i in zip(streams.keys(), next_ids)}
# actual processing
for stream_id, items in results:
for item in items:
process(item)
Anyway I don't want to sound too single-minded, but I do have the impression that the expressiveness is the bigger problem also because Redis can't ultimately distinguish between "normal" hot-plug tailing and sampling.
Complete delivery fairness would probably make this problem small enough to become almost theoretical, but I feel it's a 90% solution that takes effort and doesn't address the root problem. Maybe an addition to XREAD (e.g. a WITHCURSORS option) would be also good, or a small breaking change for RESP3, since a protocol change has to happen anyway.
To conclude: when I started writing Go, I eventually started wondering what would happen if a channel were to choke all others, and I was very happy to learn that forced uniform selection prevented such cases completely. After reading your breakdown of the problem I now can't unsee how repeated '$' are never what I want when doing simple "hot-plug" tailing, and it takes more mental effort to realize that it's mostly fine, in practice.
I hope I made my argument clear, I'm looking forward to see how this gets resolved.
@kristoff-it what you say makes sense to me in general. Yet I want to find if possible some solution that does not involve changing the protocol. There is to consider that to observe this problem you have to write likely a synthetic benchmark! This is a big thing to take in mind. In the real world it's very hard to find such synchronization between the writer and the reader, I bet that it will be very hard to reproduce this even just if we avoid using localhost as network interface.
However, there are a few things that we could do I think, I'm studying a few funny solutions. Two of the most interesting I found so far are the following:
$
received some data recently (a few milliseconds), if we are about to block, delay the command execution a bit. This is simple to do, it's just a matter of flagging the client in a given way so that the ready keys will not affect such client for a few iteration of the event loop.Maybe there are better options but for now the above look potentially ok. Solution "3" has the interesting advantage of being totally transparent for the user.
Uhmm interesting ideas.
I think the first one is less preferable than the other two because it introduces the possibility of processing the same entry multiple times. Also in this case I'm referring to simple, not rigorous usage of XREAD. If the user has a script that starts from '$' and that does some processing on 2 streams, they could get in a situation where:
At that point, if there is no activity on STREAM-2, when a new item gets added to STREAM-1 the user will see the last entry in STREAM-2 processed twice, which is unexpected given the general meaning of '$'.
Option [2] seems less clever in many ways but it shouldn't add any gotcha (assuming we return the last_id in the stream + 1, or 0 for an empty stream). As you mentioned, [2] is close to what it's already in my mind, but I also like it for its explicitness.
[3] seems also to solve the problem completely, and I can't think of a clear use case that would cause problems. It does introduce an arbitrary threshold though.
Yep however solution "3" has a fundamental problem that I'll document better Monday :-(
Hey @antirez, this issue reminds me of AE_BARRIER
issue in a way, as it too involves side effects of low-level stuff on user-level guarantees.
Please correct me if I'm wrong, but I think the real issue here is the protocol so there might not be a solution that is transparent to the user.
The documentation specifies that XREAD $
should block and only return events that have been written after the XREAD
command was issued. Ordering of commands received from different connections in the same event loop is totally arbitrary, so technically the XADD
guarantee is maintained even with the current behavior.
In theory it may be possible to add more fairness by trying to order commands inside the same event loop, but that's just one private case and the same unfair behavior can still occur if events fire up in a different combination.
In my opinion the correct fix is for XREAD
to guarantee that an ID is always returned for every $
stream, as it has several benefits:
Because the current implementation is not technically broken, I guess it's also possible to add this as another command option to maintain backward compatibility.
I agree with @yossigo , commands order is arbitrary, then which key to unblock XREAD
is random, so there is nothing to change I think.
BTW, I like the refactor in commit a092f20d87f6e435c5b68b8e217ae0d54dd12715, it's more clear to read.
Thanks for the additional background to both @yossigo and @soloestoy. I also agree that semantically everything looks ok, however to see the problem here, and why I think we need a new option that changes the behavior, is the following. Immagine the following pattern, we have two clients, C1 and C2. They never send commands at the same time, C1 send commands in a given second, and C2 the next second, always.
C1 does: XADD A ..., XADD B ... C2 does: XREAD A $ B $
Forever. Once C2 gets some ID, it substitutes such ID with the original $
character.
Now, if the first time ever C1 just did XADD A
, without adding anything to B
, this will enter in the following infinite loop:
FOREVER
C1: XADD A ...
C1: XADD B ...
C2: XREAD A 1234... B $ => We'll get data only from stream B
END
So in this example, we'll consume only the stream A forever, regardless of the fact that B has data. This is a pattern that could definitely happen in practice. Sometimes it could be what the user wants, but I bet that most of the times it is not. So I believe we should add an option that, in case we do not block because there is already data in some of the streams, returns the IDs of all the other non-empty streams, using (null) as item, and document why the option is there.
However if we do that, the option should also work in case we block actually... once the calls returns, and this requires enough refactoring/changes that it makes easy at such point to also return multiple streams if they are all ready, a feature that is now missing.
So to recap the changes to do would be:
What do you think?
I think it makes perfect sense. The user needs all IDs, and there should be a guarantee that streams always have a non-zero chance to get consumed if data is available, in order to prevent degenerate cases.
I'll leave one last consideration: it's not immediate to explain what GETIDS
protects you from exactly. People will skip the whole discussion and just say "it's better if you add it, but you'll probably be fine even if you don't", leaving others even more confused and increasing the impression that things are overwhelmingly inscrutable and complex.
I would advocate for making the most widely expected behavior the default one and reduce the amount of insider knowledge required to use Streams correctly, or otherwise asking users to add superstitious incantations to their code.
@antirez this makes sense to me. If we assume clients can deal with ID+null response, for me it would even make sense to always return ID+null for any XREAD
with multiple $
to avoid the confusion. @kristoff-it is that what you referring to as well?
@yossigo yes!
I've been playing around with the new
XREAD
feature and wrote a basic C producer/consumer test program atop thehiredis
C client and5.0-rc5
redis server. Overall the feature is amazing! I wrote up a demo that does the following:$
operator to begin with:At a naive first look one would assume that this could go a few ways: it could block until data is available on both streams or it could block until data is available on a single stream. What was seen in experimentation was that this would block until data was observed on a single stream, almost always STREAM1 first. I'd then get a reply with some data on STREAM1 and issue a new command:
and so on and so on until I observed 10 pieces of data on both streams.
What was observed in experimentation is that the behavior was pretty undefined (and likely highly system, client and code dependent) and that it would take anywhere between a 100-50000 messages received on STREAM1 to receive 10 messages on STREAM2.
Overall this seems like it's working as designed, though I think it might not be clear to everyone how to start an observer that wants to listen on the most recent data from a bunch of streams. What I'm currently doing then is sending a
TIME
command to the redis server before starting my observer and not using the$
in the firstXREAD
but rather using the timestamp returned from theTIME
call. This is working as expected.I've attached a simple test program in C here that demonstrates the behavior. There's a
#define
at the top of the file that switches between using the$
for the firstXREAD
call and using the result of aTIME
command to the redis server.Happy to have this resolved as "not an issue" as it's likely working as designed but figured I'd note the behavior here and have the behavior documented somewhere if someone else is in my shoes and is a bit confused.