Open maksimushka32 opened 5 years ago
I'm also seeing the same problem. It seems like it's more common when consuming from a topic with more partitions (~100 in my case) than less (~30)
The main problem, that i have ~15 topics, more or less the same, all with 50 partitions and all with practicaly same amount of in and out data, but problem persists only in one of them.
Hi,
Seeing your logs, it seems like your consumer was considered dead (certainly due to not sending any heartbeats in time) and a rebalance occured and finished before it was noticed by the consumer. This led to the consumer not being able to commit its offset and being stuck with this wait_empty: Waiting for...
state.
You mentioned having 15 topics with 50 partitions each, is the charge distributed across several Faust workers? Is the size of messages of the topic which trigger this error the same as other topics?
In those kind of situation, it is sometime recommended to decrease the number of messages obtained at each poll()
. On Faust, you can do that by adjusting the broker_max_poll_records
parameter (see https://faust.readthedocs.io/en/latest/userguide/settings.html#broker-max-poll-records)
any update on this?
I can confirm that setting broker_max_poll_records=500
solved my problem
thanks @StephenSorriaux
I have the same issue.
APP config:
'broker_max_poll_records': 20,
'stream_buffer_maxsize': 10000,
'broker_commit_every': 5000,
'topic_partitions': 16,
'broker_heartbeat_interval': 10,
'broker_request_timeout': 160.0,
'broker_session_timeout': 120.0,
I have three agents per worker with concurrency set to 1. There are 16 partitions, 2 workers.
Problem occurs every rescaling from 2 to 3 workers.
I'm pretty sure setting broker_max_poll_records
is not enough.
To run synchronous code I use run_in_executor
.
For me setting up: stream_wait_empty=False
solved this issue.
https://faust.readthedocs.io/en/latest/userguide/settings.html#stream-wait-empty
I could do that as my streams are idempotent. Although I'm pretty sure it's only a workaround.
Hi,
Seeing your logs, it seems like your consumer was considered dead (certainly due to not sending any heartbeats in time) and a rebalance occured and finished before it was noticed by the consumer. This led to the consumer not being able to commit its offset and being stuck with this
wait_empty: Waiting for...
state.You mentioned having 15 topics with 50 partitions each, is the charge distributed across several Faust workers? Is the size of messages of the topic which trigger this error the same as other topics? In those kind of situation, it is sometime recommended to decrease the number of messages obtained at each
poll()
. On Faust, you can do that by adjusting thebroker_max_poll_records
parameter (see https://faust.readthedocs.io/en/latest/userguide/settings.html#broker-max-poll-records)
@StephenSorriaux Thank you for providing the solution. Is it possible to lose any record by decreasing the "broker_max_poll_records" parameter?
@Li-Yun No, changing the broker_max_poll_records
parameter will not change your risk to lose any record. The only risks I see are:
@StephenSorriaux Thank you for your response.
@StephenSorriaux in my Faust application, I got a message, which is "[INFO]: Timer commit woke up too late". Is this message considered as an error or a warning message? Thanks for reading it.
Checklist
master
branch of Faust.Steps to reproduce
Faust set up as service and make some logs filtering from _in topic to _out with code like
I've got several of this "filters", but issue persists only in one
Expected behavior
Working
Actual behavior
Right after start it starts logging info messages:
Then after days of working it suddenly failed to commit offset
Re-joining group for some times and then after ~hour of rejoining it just hangs with mwssages
Full traceback
Versions