kaskada-ai / kaskada

Modern, open-source event-processing
https://kaskada.io/
Apache License 2.0
351 stars 15 forks source link

bug: multiple queries producing to same pulsar topic produces errors #67

Open jordanrfrazier opened 1 year ago

jordanrfrazier commented 1 year ago

Description If multiple files are loaded in in succession, wren will kick off queries for the same materialization multiple times. There's a couple implications of this:

  1. The data token each materialization runs on is the same - we only update the data version id for a materialization once the query successfully completes. This means we'll have "duplicate" results published to the destination.
  2. Multiple producers cannot publish to Pulsar topics at the same time. Needs more investigation, but the current behavior appeared to spam logs with errors that another producer is already publishing to the topic, it fails, it backs off, restarts, and tries again, in an infinite loop. Once the first producer is done, the next is able to claim ownership and produce messages.

The default behavior states that multiple producers should be able to publish at the same time to a topic, but perhaps the rust client behaves differently. Needs investigation. https://pulsar.apache.org/docs/2.11.x/concepts-messaging/#access-mode. However, even if that's allowed, point 1 should be addressed.

To Reproduce Steps to reproduce the behavior:

  1. Create a materialization to a pulsar topic.
  2. Load two files right after each other. If wren receives a query response and updates the materialization before you can load the second file, you can add an arbitrary sleep in wren after the first loaddata call.
  3. Check logs for errors.

Actual Behavior As stated in description

Expected Behavior Unclear. Ideally, only run the materialization once with both new files. Perhaps leave a buffer time and deduplicate process materalization requests. A long-term goal is to reduce the responsibilities of the manager, so adding a queue/buffer to manage materializations in wren may be too heavyweight to introduce as a solution.

Relevant Logs / Links https://pulsar.apache.org/docs/2.11.x/concepts-messaging/#access-mode

epinzur commented 1 year ago

related to #80