Open rgraber opened 7 months ago
Topic authorization failed
, which was buried in the error. We couldn't get to any of the messages.Error consuming event from Kafka: UnusableMessageError('Missing ce_type header on message, cannot determine signal')
is true, but very misleading.Ideally, if the entire topic is not reachable and we can't get to any messages:
We believe this was caused by a misconfigured ACL, which has now been corrected. We should have better reporting on when this sort of thing happens so we can fix it.
@timmc-edx will look into rewriting this ticket, potentially splitting into two parts (error message and alerting).
I've updated this ticket, and there are already a couple of tickets to cover the alerting side of things:
After investigating this issue on DataDog, it seems like the consumer lag metric wasn't being recorded for this topic at all before we fixed the ACL. We will probably need to make alerts for this sort of thing based on logs (once we get logs in DataDog, probably).
When we adjusted ACLs for some Kafka topics, a consumer started failing with a misleading error message (
Missing ce_type header on message, cannot determine signal
) that caused us to think there was a malformed message at the start of the topic that was blocking consumption.The real error (either
Broker: Topic authorization failed
orGroup authorization failed
) was buried in the context data; we should figure out how to surface that error instead. This might involve checking for aNone
offset or other error indicators before we try inspecting the message headers.Original description
An error in the discovery consumer:
2024-01-26 14:00:27,100 ERROR 1 [edx_event_bus_kafka.internal.consumer] consumer.py:555 - Error consuming event from Kafka: UnusableMessageError('Missing ce_type header on message, cannot determine signal') in context full_topic='prod-course-authoring-xblock-lifecycle', consumer_group='course_discovery_prod' -- event details: {'partition': 0, 'offset': None, 'headers': None, 'key': None, 'value': b'Subscribed topic not available: prod-course-authoring-xblock-lifecycle: Broker: Topic authorization failed'}Traceback (most recent call last): File "/edx/app/discovery/venvs/discovery/lib/python3.8/site-packages/edx_event_bus_kafka/internal/consumer.py", line 312, in _consume_indefinitely signal = self.determine_signal(msg) File "/edx/app/discovery/venvs/discovery/lib/python3.8/site-packages/edx_event_bus_kafka/internal/consumer.py", line 405, in determine_signal event_type = self._get_event_type_from_message(msg) File "/edx/app/discovery/venvs/discovery/lib/python3.8/site-packages/edx_event_bus_kafka/internal/consumer.py", line 426, in _get_event_type_from_message raise UnusableMessageError(edx_event_bus_kafka.internal.consumer.UnusableMessageError: Missing ce_type header on message, cannot determine signal
It's unclear why the consumer is not able to move past this error