streamnative / pulsar-spark

Spark Connector to read and write with Pulsar
Apache License 2.0
113 stars 50 forks source link

Implement Admission Control for the Pulsar-Spark connector #150

Closed ericm-db closed 1 year ago

ericm-db commented 1 year ago

Is your feature request related to a problem? Please describe.

For the Pulsar-Spark connector, we want to implement Admission Control for Pulsar as a streaming source. The user should be able to specify the maximum number of bytes that can be processed per microbatch. If we are under the threshold for the number of bytes for this microbatch, we will process the next ledger in this partition. Describe the solution you'd like Design Doc Essentially, for each microbatch, we check the average number of bytes per ledger. If we have capacity left in this microbatch (ie. are under the threshold), then we will admit the next ledger Describe alternatives you've considered In the driver, we prefetch the ledgers and cache them, but this leads to both latency and memory overhead. Correctness issues can also arise with this solution

ericm-db commented 1 year ago

Implemented in https://github.com/streamnative/pulsar-spark/pull/151