hpgrahsl / kafka-connect-mongodb

**Unofficial / Community** Kafka Connect MongoDB Sink Connector -> integrated 2019 into the official MongoDB Kafka Connector here: https://www.mongodb.com/kafka-connector
Apache License 2.0
153 stars 60 forks source link

add support for maps with list values #86

Closed sfmontyo closed 5 years ago

sfmontyo commented 5 years ago

Avro has support for maps with lists of strings. I've modified the AvroJsonSchemafulRecordConverter.java to handle this case.

hpgrahsl commented 5 years ago

First of all THX for you contribution. Can you maybe add to this and provide the following?

Many THX again for your efforts and looking into this @sfmontyo

sfmontyo commented 5 years ago

So I've modified the RecordConverterTest to test for maps with lists of ints or strings in addition to the other data types it was checking and the tests pass. I wasn't certain what to do about the "examples". Do you mean for me to edit the project Readme to have an example of a map field with list of values?

hpgrahsl commented 5 years ago

Ok I see. Honestly by reading the AVRO spec back then when I implemented this I wasn't aware that this is supported. Because the docs actually seem to not mention what you suggest here.

Irrespective of your current implementation, have you tried to write an AVRO schema file containing a map which in turn has an array as values? Also what about the other complex types of AVRO, can they also be used as value types in maps? Happy to hear your thoughts!

sfmontyo commented 5 years ago

Ah yes. We have data being shipped across an avro topic and consumed by the standard elasticsearch connector which blew up when we tried to use the mongo connector. The avro schema was handled fine by the standard elasticsearch connector just not by the mongo one. As far as I know, the only requirement for maps is that their key is a string.

hpgrahsl commented 5 years ago

Great that you found this! Coming back to my other question: Also what about the other complex types of AVRO, can they also be used as value types in maps?

And it would be nice - like you assumed anyway - if you can adapt the README in the section near the top with the sample AVRO schema to reflect this change e.g. by adding a map with list of string/int respectively.

If I understood your PR correctly the current behaviour is not affected and still works e.g. maps with string keys and any primitive type, right? Looking forward to get this merged after making some final checks and maybe small adaptions in the tests.

THX again for your contribution!

sfmontyo commented 5 years ago

Here is our avro spec for the class. { "type": "record", "name": "mystruct", "namespace": "com.xxxx.avro_generated", "fields": [ { "name": "p", "type": [ "null", "int" ], "default": null }, { "name": "at", "type": [ "null", { "type" : "map", "values" : { "type": "array", "items": "string" } } ], "default": null }, { "name": "pu", "type": [ "null", "string" ], "default": null } ] }

sfmontyo commented 5 years ago

So I've modified the example to have an optional map field that if present, will be a Map<String,List> . Also, you are correct that my changes do not affect the existing behavior - they merely support allowing map values to be arrays in addtion to the the primitives and structs.

Oh and btw, thank you for maintaining this connector!

sfmontyo commented 5 years ago

btw - my reading of the spec is that the values can be any valid schema including primitives, arrays, maps and structs. My changes support arrays, thus allowing map values to be primitives, structs or arrays. Looking at the code base, it doesn't look like it would handle a map having values that are also maps.

hpgrahsl commented 5 years ago

... My changes support arrays allowing map values to be primitives, structs or arrays.

sounds good!

Looking at the code base, it doesn't look like it would handle a map having values that are also maps.

you think it would make sense to support maps in maps? I haven't seen any use case for it so far which is why this whole issue never popped up until now that you ran into it :)

sfmontyo commented 5 years ago

We don't need maps of maps and given no one has requested it, it doesn't seem of great importance. I only mentioned it as to answer your query what other complex Avro types are supported. That wasn't a request of mine :)

sfmontyo commented 5 years ago

Ok I modified both the OBJ_MAP_1 and JSON_STRING_1 to have the same fields as OBJ_SCHEMA_1 .

hpgrahsl commented 5 years ago

THX @sfmontyo for this contribution. Highly appreciated! As always I'd be happy to learn about your concrete use cases for my project. If possible, please share :)