Closed Romsick closed 3 months ago
Hi @Romsick. Thank you for flagging it. Unfortunately, Kafka Connect Neo4j Source Connector doesn’t support the Avro enum type. I would recommend converting them to string and implementing validation on the consumer side.
Regarding nullable properties and the error you receive on null company property. Neo4j is a schema-less database. Schema is derived based on individual messages and thus can conflict if a node or relationship within the same label or type has a different set of properties. I would recommend to ensure all the nodes have the same set of properties. For example, in your example, you could treat null
properties as empty strings and filter out them on the consumer side. You could also leverage existence and type constraints. Although some of them might need you to upgrade to the later Neo4j Enterprise version. Alternatively, you can try a schema-less format, that does not require a Schema Registry like JSON.
Edit: The main reason why Kafka Connect Neo4j Source Connector doesn’t support the Avro enum type because Kafka Connect schema doesn't support Enum type.
Description
When attempting to use the Neo4j Kafka Connect Source connector configured with an AVRO value converter and corresponding schema in Schema Registry, the connector is unable to correctly produce records due to serialization errors.
Expected Behavior (Mandatory)
When the connector is configured as Source in QUERY mode with a valid AVRO schema, the messages should be correctly serialized and produced to the configured topic.
Actual Behavior (Mandatory)
The connector throws an error due to it being unable to parse messages properly with the configured AVRO schema. While testing different configurations, a few issues surfaced:
When the AVRO schema has fields configured as
enum
, and an incoming message has a valid value for the field, the connector fails withorg.apache.avro.AvroTypeException: Not an enum: {EXPECTED_VALUE} for schema
.When disabling the
neo4j.enforce.schema
option, and having a valid incoming message, the connector fails withjava.lang.ClassCastException: class java.lang.String cannot be cast to class org.apache.avro.generic.IndexedRecord
.When one of the returned fields by the Cypher query is
null
, the connector fails withorg.apache.kafka.connect.errors.SchemaBuilderException: fieldSchema for field {FIELD_NAME} cannot be null
.How to Reproduce the Problem
Dataset and configurations
Connector config
AVRO schema
Sample test Cypher to produce records for the connector
Steps (Mandatory)
uuid
logicalTypeTest cases and stacktraces
This are the results when:
I create a node with all correctly populated properties (like in the provided sample), the Neo4j connector fails with this stacktrace, indicating that the value found
Company
is not part of the expected ENUMS ofCompany
,Subcompany
for theentityType
field:With the same configuration and same node update, but setting
neo4j.enforce.schema
tofalse
, it produces a different stack trace and fails:When I update the property
company
in the node tonull
, andneo4j.enforce.schema
istrue
, the connector fails and produces:Caused by: org.apache.kafka.connect.errors.DataException: Failed to serialize Avro data from topic neo4j.test.avro : connect | at io.confluent.connect.avro.AvroConverter.fromConnectData(AvroConverter.java:93) connect | at org.apache.kafka.connect.storage.Converter.fromConnectData(Converter.java:64) connect | at org.apache.kafka.connect.runtime.AbstractWorkerSourceTask.lambda$convertTransformedRecord$9(AbstractWorkerSourceTask.java:495) connect | at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:183) connect | at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:217) connect | ... 12 more connect | Caused by: org.apache.kafka.common.errors.SerializationException: Error serializing Avro message connect | at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:166) connect | at io.confluent.connect.avro.AvroConverter$Serializer.serialize(AvroConverter.java:153) connect | at io.confluent.connect.avro.AvroConverter.fromConnectData(AvroConverter.java:86) connect | ... 16 more connect | Caused by: java.lang.ClassCastException: class java.lang.String cannot be cast to class org.apache.avro.generic.IndexedRecord (java.lang.String is in module java.base of loader 'bootstrap'; org.apache.avro.generic.IndexedRecord is in unnamed module of loader 'app') connect | at org.apache.avro.generic.GenericData.getField(GenericData.java:846) connect | at org.apache.avro.generic.GenericData.getField(GenericData.java:865) connect | at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:219) connect | at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:210) connect | at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:131) connect | at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:83) connect | at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:73) connect | at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.writeDatum(AbstractKafkaAvroSerializer.java:180) connect | at io.confluent.kafka.serializers.AbstractKafkaAvroSerializer.serializeImpl(AbstractKafkaAvroSerializer.java:156) connect | ... 18 more