Closed AbstractiveNord closed 7 months ago
The core problem is useless traffic. Instead of acking in progress state for a thousands and more messages, much better consider heartbeat as enough confirmation.
You can do this using a stream and no ack consumers I think. KV one for that.
You can do this using a stream and no ack consumers I think. KV one for that.
That's very close behavior, but the problem is no guarantee the task was even delivered to consumer. Since consumer got a task, should be explicit in progress acknowledge. Issue is about make this acknowledge single time, not cyclic.
Client heartbeats - assuming that’s what you mean - really are not enough since there is no association between tbem and whatever jetstream is doing.
So you need to signal jetstream and that’s things like AckAll
with ordered consumers, Republish, no ack consumers etc it’s achievable as the client can know when he didn’t get a message - it’s all got sequences and you can detect gaps. And we have ways to fill those gaps with single call APIs. Those are appropriate for huge scale or super low traffic scenarios but how you use them end up being tied to your use case and a little thing designed for the need.
Consumers have to be appropriate for a vast amount of use cases so of course for some they are not perfect but we do have building blocks you can use to achieve these specific needs.
Regardless it’s not a KV thing.
Client heartbeats - assuming that’s what you mean - really are not enough since there is no association between tbem and whatever jetstream is doing.
So you need to signal jetstream and that’s things like AckAll
with ordered consumers, Republish, no ack consumers etc it’s achievable as the client can know when he didn’t get a message - it’s all got sequences and you can detect gaps. And we have ways to fill those gaps with single call APIs. Those are appropriate for huge scale or super low traffic scenarios but how you use them end up being tied to your use case and a little thing designed for the need.
Consumers have to be appropriate for a vast amount of use cases so of course for some they are not perfect but we do have building blocks you can use to achieve these specific needs.
Regardless it’s not a KV thing.
AckAll can't be used, because tasks, which is messages in KV, should be able to redeliver to other instances of that consumer. Let's say you have 3 workers, every is instance of common durable consumer. Every consumer have a different max limit, and tasks should be bound to them. Any worker should be able to NAK any message, just to put that load on another instance. Order of tasks is not required, because it's additional guarantee for Jetstream to provide. I see workflow as
Yes, it's not a KV thing, just usecases related to KV.
We have had this conversation many times on slack also.
you have to signal jetstream it’s unavoidable in our design. Thats just how it is. Heartbeats isn’t a signal to jetstream.
We have had this conversation many times on slack also.
you have to signal jetstream it’s unavoidable in our design. Thats just how it is. Heartbeats isn’t a signal to jetstream.
Kinda sad, if i got this correctly, if there's no way to internals of the consumer on server to send acks or something like that.
In a distributed system the consumer can’t know what your client got. Message is given to the kernel to deliver - then its undetermined outcome.
Your client could be sending heartbeats while also not having received the message the consumer sent.
So it just doesn’t know. And it can’t know. So your client needs to tell it (and that’s also not a reliable message)
You also have to consider what an ack means and what a heartbeat means.
Heartbeat is presence. I am here to potentially process messages. 100s of thousands of messages can be delivered between heartbeats.
Ack means success. I may be there and getting messages successfully delivered and let’s say consumer could know that the message got to me. What if my disk I full and I can’t save a message? I am still sending heartbeats the CONNECTION is healthy but my ability to handle a particular message is compromised.
Further I may be handling a kind of message that I write to disk and a kind of message that I write to another database on another system. The messages to disk is failing but the ones to the database is working - my disk is not involved.
You can’t know from heartbeats that this is happening. There is no way for a consumer to know without being told. Heartbeats are participation trophies acks are 1st prices.
You also have to consider what an ack means and what a heartbeat means.
Heartbeat is presence. I am here to potentially process messages. 100s of thousands of messages can be delivered between heartbeats.
Ack means success. I may be there and getting messages successfully delivered and let’s say consumer could know that the message got to me. What if my disk I full and I can’t save a message? I am still sending heartbeats the CONNECTION is healthy but my ability to handle a particular message is compromised.
Further I may be handling a kind of message that I write to disk and a kind of message that I write to another database on another system. The messages to disk is failing but the ones to the database is working - my disk is not involved.
You can’t know from heartbeats that this is happening. There is no way for a consumer to know without being told. Heartbeats are participation trophies acks are 1st prices.
It was an initial idea. Consumer subscribe to stream, acks incoming messages as being in progress, and then server consider "presence" as enough confirm. When consumer get messages, consumer acks every message as successfully accepted for execution. If consumer can't handle some messages, consumer send nak or others acknowledges for them.
Also, to clarify, I mean "Idle Heartbeat" from consumer.
The negative acks can go missing there are no guaranteed delivery. So you don’t need it to be reliable? Is it ok to not retry some messages that failed?
I could also crash before sending the nak, or computer be turned off, network plugged out etc.
So you don’t actually need to know about failures is it that you need a maybe / perhaps it failed but assume success?
I could also crash before sending the nak, or computer be turned off, network plugged out etc.
So you don’t actually need to know about failures is it that you need a maybe / perhaps it failed but assume success?
In case of worker crash, since no new heartbeats coming, server should treat that as lost consumer and redeliver all messages on that instance. Same like now, if you didn't send acks.
But what about the messages sent between heartbeats? Heartbeats are long apart it could be 100s or thousands? When did you fail right after the heartbeat or right before the next heartbeat? It can make a million message difference.
But what about the messages sent between heartbeats? Heartbeats are long apart it could be 100s or thousands? When did you fail right after the heartbeat or right before the next heartbeat? It can make a million message difference.
I use 5s heartbeats. Since message delivered to consumer, and consumer acknowledged stream about that message, like sending InProgress, only then heartbeat should be threated as InProgress ack for that message.
The negative acks can go missing there are no guaranteed delivery. So you don’t need it to be reliable? Is it ok to not retry some messages that failed?
Reliable is key feature. Losing task is not in option. It's okay to lose some execution time when first instance of consumer goes down, and another instance of same consumer gets that message to work with.
Ok so still you can not know about thousands of messages.
So let’s look at how heartbeats work:
the purpose them is to so the client can tell the difference between:
the server does not learn anything from consumer heartbeats it also does not know if they get delivered or even made it to the same side of the world as the client
You want free reliability without doing the work to make it reliable yet say reliability is “key facter”. Sorry that is not how computers work.
I think we can stop discussing this every month :)
* They are sent only when consumer is idle not during normal use
I missed that. Gonna rethink an idea. Thanks for discussion.
Proposed change
A new type of KV buckets, based on not process messages on at time, but for long running execution, like assigning system. In fact, only change needed is to consider successful consumer heartbeat as "In Progress" message for all messages, delivered to that consumer but not ACK'ed in other way.
Use case
Distributed batch reserving systems, schedulers, batch locking systems, control planes for a big groups of workers.
Contribution
Yes