jcustenborder / kafka-connect-solr

Kafka Connect connector for writing to Solr.
Apache License 2.0
43 stars 28 forks source link

Run update to solr without overwriting existing data #46

Open Chrissyoung1223 opened 3 years ago

Chrissyoung1223 commented 3 years ago

I've already indexed one topic into solr, I'm now trying to index a separate but related topic into the same collection, due to solr's default functionality this results in the existing data being overwritten. Is there a way to tell the connector that the topic being indexed is an update and should therefore be appended rather than overwritten? I've tried looking at kafka transforms to get around this.

My topics -

topic 1 -
{
  "foo": "example", 
  "bar": 1
  "id": 10
}
topic 2 -

{
  "baz":2
  "id": 10
}

Indexing in this order will leave only topic 2 data in solr.

What solr requires -

{
  "baz": 
    {
      "add": [2]
    }
}

Is it possible to tell the connector to include the add field?

jcustenborder commented 3 years ago

As the connector is currently written no. It might be possible to change this but I haven't looked into the effort yet. Alternative the direction I would point you towards is to use something like KSQLDB or Kafka Streams to combine the two topics into a final compacted topic, then send this to SOLR. Think of your topics as an immutable log or a data product.

Chrissyoung1223 commented 3 years ago

@jcustenborder thanks for the reply, I was hoping to avoid joining the topics but given what I've found I think it may be the correct approach.

JaniceWheeler commented 3 years ago

@jcustenborder in an attempt to avoid joining streams, I have used KSQLDB to modify the topic which contains the update to include the "add" element. E.g. {"id":10, "baz": {"add": [2]}}
which is correct syntax and works for a solr update.

When using the solr connector however this throws an error as it thinks the "baz" field is nested org.apache.solr.common.SolrException: Unable to index docs with children: the schema must include definitions for both a uniqueKey field and the 'root' field, using the exact same fieldType I'm guessing this has something to with the fact that value of the baz field is a map data type? (I work alongside Chrissyoung1223)