streamnative / pulsar-io-cloud-storage

Cloud Storage Connector integrates Apache Pulsar with cloud storage.
Apache License 2.0
28 stars 25 forks source link

feat: Read JSON directly from the original data when formatType=json #966

Closed shibd closed 6 months ago

shibd commented 6 months ago

Motivation

In pulsar JSON schema, When data is serialized, the legitimacy of the data cannot be verified. This can lead to data and schema incompatibility within a topic.

Then, when this connector deals with these messages, the Object value = record.getField(field); value may be null.

https://github.com/streamnative/pulsar-io-cloud-storage/blob/b2b28ddc60c83b69421bd8e03fe9524c61deb2b4/src/main/java/org/apache/pulsar/io/jcloud/format/JsonFormat.java#L134-L148

In fact, for the JSON schema of cloud storage, there is no requirement for schema compatibility, and we can directly send the original JSON data to cloud storage

Modifications

Verifying this change

Documentation

Check the box below.

Need to update docs?