streamnative / pulsar-spark

Spark Connector to read and write with Pulsar
Apache License 2.0
113 stars 50 forks source link

[FEATURE] Add support for `KEY_VALUE` Schema #164

Open oneebhkhan opened 1 year ago

oneebhkhan commented 1 year ago

I am streaming CDC messages from PostGRES using the PostGRES Debezium Connector.

The topic has the following schema

"type": "KEY_VALUE",
    "timestamp": 1695369844027,
    "properties": {
      "key.schema.name": "KafKaJson",
      "key.schema.properties": "{\"__AVRO_READ_OFFSET__\":\"0\"}",
      "key.schema.type": "JSON",
      "kv.encoding.type": "SEPARATED",
      "value.schema.name": "KafKaJson",
      "value.schema.properties": "{\"__AVRO_READ_OFFSET__\":\"0\"}",
      "value.schema.type": "JSON"

I am successfully able to consume the messages on the Pulsar topic, however when I try to consume it using the Spark connector, I get the following error: We do not support KEY_VALUE currently

I would like to request for this connector to support the KEY_VALUE schema so that Im properly correctly able to consume the messages.

As a stop-gap solution, I have enabled the allowDifferentTopicSchemas flag - which allows me to consume the messages since the Topic schema are not taken into account.