A suggestion for new functionality. The output plugin was only able to write/upload to a single s3 prefix for each batch.
The plugin has been extended so that a batch can be split up and written/uploaded to multiple s3 prefixes.
The old functionality remains unchanged. In order to enable a multi-upload you must set the :s3/multi-upload boolean to true and the s3/prefix-key to a specific keyword.
This keyword must be a key in each segment (which is a map), and the value must be a destination s3-prefix. The serializer-fn can then find the message in each segment.
A test has been added that shows this functionality.
An example of a batch of segments:
[{:s3-key "click/2017/10/30/01" :message {:user-id "user1"}}{:s3-key "click/2017/10/30/02" :message {:user-id "user2"}}{:s3-key "conversion/2017/10/30/01" {:user-id "user1"}}{:s3-key "conversion/2017/10/30/02" {:user-id "user2"}}]
The old function would just output the batch to a single bucket/prefix.
But when multi-upload is true and s3/prefix-key is s3-key and the serializer-fn takes out :info from each segment the following would result:
<bucket>/click/2017/10/30/01 [{:user-id "user1"}]<bucket>/click/2017/10/30/02 [{:user-id "user2"}]<bucket>/conversion/2017/10/30/01 [{:user-id "user1"}]<bucket>/conversion/2017/10/30/02 [{:user-id "user2"}]
A suggestion for new functionality. The output plugin was only able to write/upload to a single s3 prefix for each batch.
The plugin has been extended so that a batch can be split up and written/uploaded to multiple s3 prefixes. The old functionality remains unchanged. In order to enable a multi-upload you must set the
:s3/multi-upload
boolean totrue
and thes3/prefix-key
to a specific keyword. This keyword must be a key in each segment (which is a map), and the value must be a destination s3-prefix. Theserializer-fn
can then find the message in each segment.A test has been added that shows this functionality.
An example of a batch of segments:
[{:s3-key "click/2017/10/30/01" :message {:user-id "user1"}}
{:s3-key "click/2017/10/30/02" :message {:user-id "user2"}}
{:s3-key "conversion/2017/10/30/01" {:user-id "user1"}}
{:s3-key "conversion/2017/10/30/02" {:user-id "user2"}}]
The old function would just output the batch to a single bucket/prefix. But when
multi-upload
istrue
ands3/prefix-key
iss3-key
and theserializer-fn
takes out:info
from each segment the following would result:<bucket>/click/2017/10/30/01 [{:user-id "user1"}]
<bucket>/click/2017/10/30/02 [{:user-id "user2"}]
<bucket>/conversion/2017/10/30/01 [{:user-id "user1"}]
<bucket>/conversion/2017/10/30/02 [{:user-id "user2"}]