onyx-platform / onyx-amazon-s3

Amazon S3 plugin for Onyx
1 stars 9 forks source link

Feature/upload to multiple s3 prefixes #12

Closed eelkevanfoeken closed 6 years ago

eelkevanfoeken commented 6 years ago

A suggestion for new functionality. The output plugin was only able to write/upload to a single s3 prefix for each batch.

The plugin has been extended so that a batch can be split up and written/uploaded to multiple s3 prefixes. The old functionality remains unchanged. In order to enable a multi-upload you must set the :s3/multi-upload boolean to true and the s3/prefix-key to a specific keyword. This keyword must be a key in each segment (which is a map), and the value must be a destination s3-prefix. The serializer-fn can then find the message in each segment.

A test has been added that shows this functionality.

An example of a batch of segments: [{:s3-key "click/2017/10/30/01" :message {:user-id "user1"}} {:s3-key "click/2017/10/30/02" :message {:user-id "user2"}} {:s3-key "conversion/2017/10/30/01" {:user-id "user1"}} {:s3-key "conversion/2017/10/30/02" {:user-id "user2"}}]

The old function would just output the batch to a single bucket/prefix. But when multi-upload is true and s3/prefix-key is s3-key and the serializer-fn takes out :info from each segment the following would result: <bucket>/click/2017/10/30/01 [{:user-id "user1"}] <bucket>/click/2017/10/30/02 [{:user-id "user2"}] <bucket>/conversion/2017/10/30/01 [{:user-id "user1"}] <bucket>/conversion/2017/10/30/02 [{:user-id "user2"}]

MichaelDrogalis commented 6 years ago

LGTM. cc @lbradstreet

lbradstreet commented 6 years ago

Looks great. Thanks!