Open hernandezc1 opened 1 year ago
Even if one of the (multiple) topics is not live, the consumer will ingest alerts from the topics that are live
- I think perhaps you meant that the Kafka -> Pub/Sub Connector will behave this way if given a list of topics?
Yes, that's correct. I'll update the wording of the main comment.
list.topics
. If the topic was present in the list, then the Kafka -> Pub/Sub connector would start.In order to accommodate for multiple topics, we now parse through the string containing the comma-separated Kafka topics and append each topic name to an array called KAFKA_TOPIC_ARRAY
. Using a for
loop, we search to see whether the elements of KAFKA_TOPIC_ARRAY
are present in list.topics
. If a topic is present, then the value true
is appended to separate array: kafka_topic_present
.
If all of the topics are present in list.topics
, then KAFKA_TOPIC_ARRAY
and kafka_topic_present
will have the same length. This requirement must now be satisfied in order to start the Kafka -> Pub/Sub connector.
The ELAsTiCC2 Challenge restarted last Friday, November 10th. The streaming and database statistics were just updated on the TOM Toolkit, and I wanted to give a brief update.
We began this challenge by listening to the following topics: elasticc2-2-wfd
, elasticc2-2-ddf-full
. I've continued to pay close attention to the metrics on the Google Cloud console.
Below is a screenshot of the metrics provided by the TOM Toolkit:
I was surprised to see that we were only processing about 50% of the alerts streamed every night. From this, I inferred that we were likely only consuming alerts from one of the two topics specified above. To learn more about this issue, I ssh'd into our consumer's VM and ran the startup script manually to see if there were any issues with subscribing to the topics. As you can see in the screenshot below, there seems to be no issues subscribing to the two topics.
As of right now, I've updated the consumer's metadata to subscribe to all three Kafka topics (elasticc2-2-wfd
, elasticc2-2-ddf-full
, elasticc2-2-ddf-limited
). I'd like to see if the number of processed alerts increases for tonight's stream. If it does not increase, I will reach out to the elasticc-comms slack channel to get some more clarity on this issue.
A quick update: I ran a query to count the number of alerts that our broker processed last night to compare it to the number of alerts that were streamed.
Thankfully, our numbers are consistent. See the screenshots below:
I've messaged Rob asking if the metrics on the TOM Toolkit are up to date.
Are the "WARN" messages in the log expected? If so, would it be appropriate to suppress them so that someone reading the log doesn't have to re-check why those configs aren't "know config"s?
At least some of those "WARN" messages about unknown configs are normal in the sense that they've always occurred for us but don't seem to cause problems. It would be great to suppress them, but I don't know how offhand. The logs are generated by the Kafka -> Pub/Sub Connector, which is a Java application and I'm not very familiar with that language. (We don't ever touch Java explicitly ourselves; we just start up the application and pass in the configs).
A little history: Back when I set up this consumer I tried to track these down and "fix" them but I wasn't successful. Strangely, for at least one of those configs, removing it actually broke the consumer -- yet it complains about not knowing what it is. It's possible that I just didn't know enough about what I was doing. I wish I would have documented my trials better. We've made config changes since then (perhaps especially in #205), so some of these warnings may also be new and potentially causing problems. It's worth putting more work into, though I don't know if/when it'll be high enough on the priority list.
Ah, thanks for the details and history. Agree that it doesn't look like it's like going to rise in the priority list to fix.
In order to ingest alerts, we (or a user) must specify the
KAFKA_TOPIC
a consumer VM in our pipeline must listen to. PR #205 notes that this may be defined as a comma-separated list of topics, which allows a single consumer VM to listen to multiple topics simultaneously.Currently, this is relevant in the context of the ELAsTiCC2 Challenge. The size of the alerts has increased significantly since the original ELAsTiCC Challenge, and as a result, DESC has decided to create three independent alert streams. For the recent test streams, this includes:
elasticc2-stN-wfd
-- WFD objects, including up to 365 days of previous sources and forced sourceselasticc2-stN-ddf-full
-- DDF objects, including up to 365 days of previous sources and forced sourceselasticc2-stN-ddf-limited
-- DDF objects, including up to 30 days of previous sources and forced sources(Replace
N
with the numbers 1, 2, 3, ..., etc. These numbers represent theN
th test stream before the re-start of the ELAsTiCC2 Challenge).This issue serves to document my experience and findings associated with having our consumer VM listen to multiple KAFKA topics.
So far, I would like to note: