jcustenborder / kafka-connect-solr

Kafka Connect connector for writing to Solr.
Apache License 2.0
44 stars 28 forks source link

Destination Solr Document fails to preserve field order #31

Open cwsusa opened 4 years ago

cwsusa commented 4 years ago

Examining a topic in kafka shows field ordering as desired for Solr update.

Yet this connector generates a randomly (not front to back or back to front) SolrInputDocument field list.

User Impact: SolrDocuments in the destination cloud are inconsistent with design specs.

Performance Impact: Solr Cloud will suffer performance impacts from misordered document schema. High performance Inverted indexes are frequently designed to minimize time to find a field occurrence of the term when doing constrained field based queries.

Suggested Remedy: Not sure about the code involved, the SinkRecord(?) needs to retrieve topic fields in order into an ordered map. JSON objects are notorious for ordering randomness. LinkedHashMap's are one pattern that can be used to preserve original topic field ordering into a SolrInputDocument.

jcustenborder commented 4 years ago

That's an interesting issue @cwsusa. Other than reordering the map at the last minute, I'm not sure how we would go about fixing it. When you use JSON, the JSON converter provides a map to the connector. This is done by jackson at the lowest level, and is most likely a hashmap. I would need to confirm that but based on the behavior that would make sense. Have you considered using Avro? Schema-registry supports Avro and the field order would be persisted end to end. This might work better for your use case.