Closed pri-naik5 closed 1 year ago
Additionally with this input -
[ { "field1": "", "field2" : "test5", "field3": [ { "field3_a": "test", "field3_b": "test2", "field3_c": "test3" } ], "field4": null }, { "field1": "test1", "field2" : "test5", "field4": { "field4a": "this_value", "field4b": "", "field4c": "", "field4d": "", "field4e": "", "field4f": "", "field4g": "", "field4h": [ { "field4ha": "", "field4hb": "", "field4hc": "" } ] } } ]
I see this error:
io.streamthoughts.kafka.connect.filepulse.data.DataException: Failed to merge schemas for field 'field4'. at io.streamthoughts.kafka.connect.filepulse.data.StructSchema$StructSchemaMerger.apply(StructSchema.java:341) at io.streamthoughts.kafka.connect.filepulse.data.StructSchema.merge(StructSchema.java:260) at io.streamthoughts.kafka.connect.filepulse.data.LazyArraySchema.valueSchema(LazyArraySchema.java:55) at io.streamthoughts.kafka.connect.filepulse.filter.JSONFilter.apply(JSONFilter.java:90) at io.streamthoughts.kafka.connect.filepulse.filter.AbstractMergeRecordFilter.apply(AbstractMergeRecordFilter.java:43) at io.streamthoughts.kafka.connect.filepulse.filter.DefaultRecordFilterPipeline$FilterNode.apply(DefaultRecordFilterPipeline.java:162) at io.streamthoughts.kafka.connect.filepulse.filter.DefaultRecordFilterPipeline.apply(DefaultRecordFilterPipeline.java:134) at io.streamthoughts.kafka.connect.filepulse.filter.DefaultRecordFilterPipeline.apply(DefaultRecordFilterPipeline.java:102) at io.streamthoughts.kafka.connect.filepulse.source.DefaultFileRecordsPollingConsumer.next(DefaultFileRecordsPollingConsumer.java:176) at io.streamthoughts.kafka.connect.filepulse.source.FilePulseSourceTask.poll(FilePulseSourceTask.java:199) at org.apache.kafka.connect.runtime.WorkerSourceTask.poll(WorkerSourceTask.java:304) at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:248) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:243) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: io.streamthoughts.kafka.connect.filepulse.data.DataException: Cannot merge incompatible schema type NULL<>STRUCT at io.streamthoughts.kafka.connect.filepulse.data.Schema.merge(Schema.java:203) at io.streamthoughts.kafka.connect.filepulse.data.StructSchema$StructSchemaMerger.apply(StructSchema.java:338) ... 18 more
@fhussonnois I have added a fix for this. Can I get permissions to create a PR on this repo?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue was closed because it has been stalled for 30 days with no activity.
Describe the bug v2.6.0: null values in JSON objects (not within a struct) are not evaluated appropriately and throw an error in JSONFilter
To Reproduce
[ { "field1": null, "field2" : "test5", "field3": [ { "field3_a": "test", "field3_b": "test2", "field3_c": "test3" } ], "field4": { "field4a": null, "field4b": "", "field4c": "", "field4d": "", "field4e": "", "field4f": "", "field4g": "", "field4h": [ { "field4ha": "", "field4hb": "", "field4hc": null } ] } }, { "field1": "test1", "field2" : "test5", "field4": { "field4a": "this_value", "field4b": "", "field4c": "", "field4d": "", "field4e": "", "field4f": "", "field4g": "", "field4h": [ { "field4ha": "", "field4hb": "", "field4hc": "" } ] } } ]
Expected behavior Processes the file without issues, auto-generates the schema with both [null, STRING] or [null, STRUCT] for field types for field1 and field4 respectively.
Screenshots With the initial file:
io.streamthoughts.kafka.connect.filepulse.data.DataException: Failed to merge schemas for field 'field1'. at io.streamthoughts.kafka.connect.filepulse.data.StructSchema$StructSchemaMerger.apply(StructSchema.java:341) at io.streamthoughts.kafka.connect.filepulse.data.StructSchema.merge(StructSchema.java:260) at io.streamthoughts.kafka.connect.filepulse.data.LazyArraySchema.valueSchema(LazyArraySchema.java:55) at io.streamthoughts.kafka.connect.filepulse.filter.JSONFilter.apply(JSONFilter.java:90) at io.streamthoughts.kafka.connect.filepulse.filter.AbstractMergeRecordFilter.apply(AbstractMergeRecordFilter.java:43) at io.streamthoughts.kafka.connect.filepulse.filter.DefaultRecordFilterPipeline$FilterNode.apply(DefaultRecordFilterPipeline.java:162) at io.streamthoughts.kafka.connect.filepulse.filter.DefaultRecordFilterPipeline.apply(DefaultRecordFilterPipeline.java:134) at io.streamthoughts.kafka.connect.filepulse.filter.DefaultRecordFilterPipeline.apply(DefaultRecordFilterPipeline.java:102) at io.streamthoughts.kafka.connect.filepulse.source.DefaultFileRecordsPollingConsumer.next(DefaultFileRecordsPollingConsumer.java:176) at io.streamthoughts.kafka.connect.filepulse.source.FilePulseSourceTask.poll(FilePulseSourceTask.java:199) at org.apache.kafka.connect.runtime.WorkerSourceTask.poll(WorkerSourceTask.java:304) at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:248) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:243) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: io.streamthoughts.kafka.connect.filepulse.data.DataException: Cannot merge incompatible schema type NULL<>STRING at io.streamthoughts.kafka.connect.filepulse.data.Schema.merge(Schema.java:203) at io.streamthoughts.kafka.connect.filepulse.data.StructSchema$StructSchemaMerger.apply(StructSchema.java:338) ... 18 more
After I replace field1 with a non-null string value:
io.streamthoughts.kafka.connect.filepulse.data.DataException: Failed to merge schemas for field 'field4'. at io.streamthoughts.kafka.connect.filepulse.data.StructSchema$StructSchemaMerger.apply(StructSchema.java:341) at io.streamthoughts.kafka.connect.filepulse.data.StructSchema.merge(StructSchema.java:260) at io.streamthoughts.kafka.connect.filepulse.data.LazyArraySchema.valueSchema(LazyArraySchema.java:55) at io.streamthoughts.kafka.connect.filepulse.filter.JSONFilter.apply(JSONFilter.java:90) at io.streamthoughts.kafka.connect.filepulse.filter.AbstractMergeRecordFilter.apply(AbstractMergeRecordFilter.java:43) at io.streamthoughts.kafka.connect.filepulse.filter.DefaultRecordFilterPipeline$FilterNode.apply(DefaultRecordFilterPipeline.java:162) at io.streamthoughts.kafka.connect.filepulse.filter.DefaultRecordFilterPipeline.apply(DefaultRecordFilterPipeline.java:134) at io.streamthoughts.kafka.connect.filepulse.filter.DefaultRecordFilterPipeline.apply(DefaultRecordFilterPipeline.java:102) at io.streamthoughts.kafka.connect.filepulse.source.DefaultFileRecordsPollingConsumer.next(DefaultFileRecordsPollingConsumer.java:176) at io.streamthoughts.kafka.connect.filepulse.source.FilePulseSourceTask.poll(FilePulseSourceTask.java:199) at org.apache.kafka.connect.runtime.WorkerSourceTask.poll(WorkerSourceTask.java:304) at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:248) at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188) at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:243) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: io.streamthoughts.kafka.connect.filepulse.data.DataException: Failed to merge schemas for field 'field4a'. at io.streamthoughts.kafka.connect.filepulse.data.StructSchema$StructSchemaMerger.apply(StructSchema.java:341) at io.streamthoughts.kafka.connect.filepulse.data.StructSchema.merge(StructSchema.java:260) at io.streamthoughts.kafka.connect.filepulse.data.StructSchema$StructSchemaMerger.apply(StructSchema.java:338) ... 18 more Caused by: io.streamthoughts.kafka.connect.filepulse.data.DataException: Cannot merge incompatible schema type NULL<>STRING at io.streamthoughts.kafka.connect.filepulse.data.Schema.merge(Schema.java:203) at io.streamthoughts.kafka.connect.filepulse.data.StructSchema$StructSchemaMerger.apply(StructSchema.java:338) ... 20 more
Additional context Config of connector: `fs.scan.interval.ms: "10000" fs.scan.filters: "io.streamthoughts.kafka.connect.filepulse.scanner.local.filter.RegexFileListFilter" file.filter.regex.pattern: ".*\.json$" tasks.reader.class: "io.streamthoughts.kafka.connect.filepulse.fs.reader.LocalBytesArrayInputReader" errors.log.include.messages: true errors.log.enable: true
offset.strategy: "name"