streamnative / pulsar-spark

Spark Connector to read and write with Pulsar
Apache License 2.0
113 stars 50 forks source link

Fix data loss of initial batch and add available now trigger test #156

Closed chaoqin-li1123 closed 1 year ago

chaoqin-li1123 commented 1 year ago

Motivation

When start offset are serialized and deserialized, UserProvidedMessageId wrapper is dropped, which cause the first message to be skipped. As we migrate to 3.4, we also want to add test coverage for available now trigger.

Modifications

Fix data loss of initial batch by wrapping it in UserProvidedMessageId when deserializing the offset. Add available now trigger test.

Verifying this change

(Please pick either of the following options)

Documentation

Check the box below.

Need to update docs?