mwvgroup / Pitt-Google-Broker

A Google Cloud-based alert broker for LSST and ZTF
https://pitt-broker.readthedocs.io/en/latest/index.html
4 stars 0 forks source link

Defining multiple KAFKA topics that our consumer VM will listen to #212

Open hernandezc1 opened 1 year ago

hernandezc1 commented 1 year ago

In order to ingest alerts, we (or a user) must specify the KAFKA_TOPIC a consumer VM in our pipeline must listen to. PR #205 notes that this may be defined as a comma-separated list of topics, which allows a single consumer VM to listen to multiple topics simultaneously.

Currently, this is relevant in the context of the ELAsTiCC2 Challenge. The size of the alerts has increased significantly since the original ELAsTiCC Challenge, and as a result, DESC has decided to create three independent alert streams. For the recent test streams, this includes:

(Replace N with the numbers 1, 2, 3, ..., etc. These numbers represent the N th test stream before the re-start of the ELAsTiCC2 Challenge).

This issue serves to document my experience and findings associated with having our consumer VM listen to multiple KAFKA topics.

So far, I would like to note:

troyraen commented 1 year ago

Even if one of the (multiple) topics is not live, the consumer will ingest alerts from the topics that are live

  1. I think perhaps you meant that the Kafka -> Pub/Sub Connector will behave this way if given a list of topics? There are certainly some consumer startup scripts in this repo that will not behave this way (in fact, some will just crash if given a list of topics).
  2. What happens if the consumer starts ingesting the one live topic and then some time later the other topic goes live? Will it automatically start ingesting the new one as well? This is probably not high on our priority list, but will be important to learn at some point.
hernandezc1 commented 1 year ago
  1. I think perhaps you meant that the Kafka -> Pub/Sub Connector will behave this way if given a list of topics?

Yes, that's correct. I'll update the wording of the main comment.

hernandezc1 commented 1 year ago

213 updates the requirements needed to start the Kafka -> Pub/Sub connector for our consumer VM. Previously, when a single topic was defined, the consumer's startup script would search for the topic in a file called list.topics. If the topic was present in the list, then the Kafka -> Pub/Sub connector would start.

In order to accommodate for multiple topics, we now parse through the string containing the comma-separated Kafka topics and append each topic name to an array called KAFKA_TOPIC_ARRAY. Using a for loop, we search to see whether the elements of KAFKA_TOPIC_ARRAY are present in list.topics. If a topic is present, then the value true is appended to separate array: kafka_topic_present.

If all of the topics are present in list.topics, then KAFKA_TOPIC_ARRAY and kafka_topic_present will have the same length. This requirement must now be satisfied in order to start the Kafka -> Pub/Sub connector.

hernandezc1 commented 1 year ago

The ELAsTiCC2 Challenge restarted last Friday, November 10th. The streaming and database statistics were just updated on the TOM Toolkit, and I wanted to give a brief update.

We began this challenge by listening to the following topics: elasticc2-2-wfd, elasticc2-2-ddf-full. I've continued to pay close attention to the metrics on the Google Cloud console.

Below is a screenshot of the metrics provided by the TOM Toolkit:

Screenshot 2023-11-14 at 9 46 36 AM

Screenshot 2023-11-14 at 9 49 08 AM

I was surprised to see that we were only processing about 50% of the alerts streamed every night. From this, I inferred that we were likely only consuming alerts from one of the two topics specified above. To learn more about this issue, I ssh'd into our consumer's VM and ran the startup script manually to see if there were any issues with subscribing to the topics. As you can see in the screenshot below, there seems to be no issues subscribing to the two topics.

Screenshot 2023-11-14 at 9 32 41 AM

As of right now, I've updated the consumer's metadata to subscribe to all three Kafka topics (elasticc2-2-wfd, elasticc2-2-ddf-full, elasticc2-2-ddf-limited). I'd like to see if the number of processed alerts increases for tonight's stream. If it does not increase, I will reach out to the elasticc-comms slack channel to get some more clarity on this issue.

hernandezc1 commented 1 year ago

A quick update: I ran a query to count the number of alerts that our broker processed last night to compare it to the number of alerts that were streamed.

Thankfully, our numbers are consistent. See the screenshots below:

Screenshot 2023-11-14 at 10 21 21 AM

Screenshot 2023-11-14 at 10 21 33 AM

I've messaged Rob asking if the metrics on the TOM Toolkit are up to date.

wmwv commented 1 year ago

Are the "WARN" messages in the log expected? If so, would it be appropriate to suppress them so that someone reading the log doesn't have to re-check why those configs aren't "know config"s?

troyraen commented 1 year ago

At least some of those "WARN" messages about unknown configs are normal in the sense that they've always occurred for us but don't seem to cause problems. It would be great to suppress them, but I don't know how offhand. The logs are generated by the Kafka -> Pub/Sub Connector, which is a Java application and I'm not very familiar with that language. (We don't ever touch Java explicitly ourselves; we just start up the application and pass in the configs).

A little history: Back when I set up this consumer I tried to track these down and "fix" them but I wasn't successful. Strangely, for at least one of those configs, removing it actually broke the consumer -- yet it complains about not knowing what it is. It's possible that I just didn't know enough about what I was doing. I wish I would have documented my trials better. We've made config changes since then (perhaps especially in #205), so some of these warnings may also be new and potentially causing problems. It's worth putting more work into, though I don't know if/when it'll be high enough on the priority list.

wmwv commented 1 year ago

Ah, thanks for the details and history. Agree that it doesn't look like it's like going to rise in the priority list to fix.