streamnative / pulsar-spark

Spark Connector to read and write with Pulsar
Apache License 2.0
113 stars 50 forks source link

Adding maxBytesPerTrigger tag for Pulsar Admission Control #151

Closed ericm-db closed 1 year ago

ericm-db commented 1 year ago

Motivation

Some users that request Pulsar Spark connector also request that the Pulsar source has ratelimit functionality. They would like to control the rate of data processing and resource consumption of streaming queries that use the Pulsar source. This can be achieved by implementing admission control in pulsar source.

Modifications

Added a config called maxBytesPerTrigger which allows users to configure how many bytes are consumed for each microbatch and shared between topic-partitions

Verifying this change

(Please pick either of the following options)

Check the box below.

Need to update docs?

ericm-db commented 1 year ago

@atezs82 We are currently trying to implement functionality similar to your PR here: https://github.com/streamnative/pulsar-spark/pull/63

atezs82 commented 1 year ago

@ericm-db Thanks for picking this idea up! Since we do not use Pulsar anymore my work on the other PR was seriously down-prioritized. I'm glad though that this might be present in the connector in some form, since I personally think that this is very useful for eg. some CDC usecases.

chaoqin-li1123 commented 1 year ago

Please also update documentation for admin url and maxBytesPerTrigger in README.md

ericm-db commented 1 year ago

Previously, we spent the effort to remove the pulsarAdmin calls from connector in #118 due to security concerns.

This is a required feature from many enterprise customers. We probably need to find walk around or at least allow users to choose whether provide it or not.

@nlu90 The pulsarAdmin is only used if the client needs to use admission control, I'm guessing this isn't enough?

chaoqin-li1123 commented 1 year ago

Can you also update the doc for adminUrl and mark it as optional and only needed when a read limit is specified? @ericm-db

ericm-db commented 1 year ago

Can you also update the doc for adminUrl and mark it as optional and only needed when a read limit is specified? @ericm-db

Done

chaoqin-li1123 commented 1 year ago

Can you rebase with the latest master and fix the build failure? @ericm-db