qubole / streamx

kafka-connect-s3 : Ingest data from Kafka to Object Stores(s3)
Apache License 2.0
97 stars 54 forks source link

saving json data , partition by specific field (timestamp) #58

Open doriwaldman opened 6 years ago

doriwaldman commented 6 years ago

I have a question, data in kafka is in json format, in each event I have a field called"eventTimestamp" which is a long number which represents the event time , I want to save the data in s3 in hourly bucket based on that timestamp, not the time the event was added to Kafka

my settings when I used Kafka s3 connect are :

connector.class=io.confluent.connect.s3.S3SinkConnector storage.class=io.confluent.connect.s3.storage.S3Storage format.class=io.confluent.connect.s3.format.json.JsonFormat schema.generator.class=io.confluent.connect.storage.hive.schema.DefaultSchemaGenerator partitioner.class=io.confluent.connect.storage.partitioner.TimeBasedPartitioner timestamp.extractor=RecordField path.format='year'=YYYY/'month'=MM/'day'=dd/'hour'=HH timestamp.field=eventTimestamp partition.duration.ms=10 locale=en_IN timezone=UTC

I see that streamx support TimeBasedPartitioner but if I understand it can only support to extract RecordField from parquet or avro not from json

Is it possible to do it with json ?