To avoid each application having to log/export/import data out of/into Streambed Logged, two tools could be provided that takes care of this common requirement. That's the rationale. The remainder of this issue tables some thoughts on their design.
logged
I'm thinking that there could be a tool to import/export from streambed-logged file stores. The tool should use stdin to import JSON per streambed's ConsumerRecord. Assuming the tool is named logged and uses streambed-logged's FileLog and we therefore wish to produce records on the log:
cat /tmp/some-records.json | logged produce --root-path=/var/lib/logged --file=-
(the argument format could be similar to those of the existing streambed-logged's CommitLogArgs as per --root-path, perhaps also with some consumer group representing the tool).
By convention, the --file (or -f) indicates that the tool is to consume records from a file. The - convention is used to indicate STDIN. Alternatively, a file name could be provided for that argument e.g.:
logged produce --root-path=/var/lib/logged -f=/tmp/some-records.json
Records will be deserialised and then appended to the commit log until an EOF is reached. The payload is assumed to have been encrypted prior to invoking the logged tool. Encryption is dealt with later in this issue via the confidant tool.
For output, a topic name to consume from is required along with the consume command, and is mutually exclusive to --file being provided:
In addition, a namespace option (--ns) can be provided, otherwise the namespace is the default namespace.
Also by default, records of the topic are consumed and output to STDOUT. A --o (or --output) option can be used to write to a file. Records are output as JSON.
An --offset argument can be provided to declare the offset + 1 to start consuming from (the offset is often used to represent the last offset observed, hence returning the next record). If no offset is provided then consumption starts from the beginning of the log.
A idle_timeout can be provided. This is the amount of time to indicate that no more events are available when consuming from the topic and, if timed out on, will cause the logged tool to return. Otherwise, without this option, the logged tool will wait indefinitely and consume records as they are appended to by any other process.
confidant
The second tool is confidant as per the library it provides access to, and has the ability to encrypt/decrypt data.
Similar to logged, confidant will take a stream of input data, perform decryption given a lookup in the confidant secret store, and then output the stream. A stream of JSON objects are expected as input, and an argument will be provided that selects the JSON field value to decrypt of each object. This field value is expected to be encoded as BASE64. Once decrypted, the same JSON object will be output with the decrypted payload. For example:
(the argument format could be similar to those of the existing streambed-confidant's SsArgs as per --root-path, perhaps also with some role id representing the tool).
The -f argument is as per the above logged tool. In the above, STDIN is used and records are consumed from the commit log and fed into confidant.
The -o or --output argument can optionally specify a file to write to, otherwise records will be output to STDOUT.
In addition, a namespace option (--ns) can be provided, otherwise the namespace is the default namespace.
In order to initialise the secret store, a root secret is also required. A credentials-directory path can be provided where a root-secret file is expected. This argument corresponds conveniently with systemd's CREDENTIALS_DIRECTORY environment variable and is used by various services we have written. The myappservice in the above example represents the service normally associated with appending to the commit log.
Also associated with the CREDENTIALS_DIRECTORY is the secret_id for role-based authentication with the secret store. This secret is expected to be found in the ss-secret-id file of that directory.
A --select option selects a field out of the JSON structure. In the case of our example, that field is the value field of a commit log's ProducerRecord structure (the default for this option).
The secret-path option specifies the path of the secret to lookup for decryption/encryption. This is application specific, and also highlights a constraint of the tool in that only one secret per topic is catered for. Some applications may have multiple secrets per topic and will then require their own tool for decrypting data.
Encryption functionality is achieved by using an encrypt command in place of the "decrypt" one above. In this mode, the JSON objects read are assumed to have a field that is decrypted. This field will then become encrypted and output in place as a BASE64 field.
To avoid each application having to log/export/import data out of/into Streambed Logged, two tools could be provided that takes care of this common requirement. That's the rationale. The remainder of this issue tables some thoughts on their design.
logged
I'm thinking that there could be a tool to import/export from streambed-logged file stores. The tool should use stdin to import JSON per streambed's
ConsumerRecord
. Assuming the tool is namedlogged
and usesstreambed-logged
'sFileLog
and we therefore wish to produce records on the log:(the argument format could be similar to those of the existing streambed-logged's
CommitLogArgs
as per--root-path
, perhaps also with some consumer group representing the tool).By convention, the
--file
(or-f
) indicates that the tool is to consume records from a file. The-
convention is used to indicate STDIN. Alternatively, a file name could be provided for that argument e.g.:Records will be deserialised and then appended to the commit log until an EOF is reached. The payload is assumed to have been encrypted prior to invoking the
logged
tool. Encryption is dealt with later in this issue via theconfidant
tool.For output, a topic name to consume from is required along with the
consume
command, and is mutually exclusive to--file
being provided:In addition, a namespace option (
--ns
) can be provided, otherwise the namespace is the default namespace.Also by default, records of the topic are consumed and output to STDOUT. A
--o
(or--output
) option can be used to write to a file. Records are output as JSON.An
--offset
argument can be provided to declare the offset + 1 to start consuming from (the offset is often used to represent the last offset observed, hence returning the next record). If no offset is provided then consumption starts from the beginning of the log.A
idle_timeout
can be provided. This is the amount of time to indicate that no more events are available when consuming from the topic and, if timed out on, will cause thelogged
tool to return. Otherwise, without this option, thelogged
tool will wait indefinitely and consume records as they are appended to by any other process.confidant
The second tool is
confidant
as per the library it provides access to, and has the ability to encrypt/decrypt data.Similar to
logged
,confidant
will take a stream of input data, perform decryption given a lookup in theconfidant
secret store, and then output the stream. A stream of JSON objects are expected as input, and an argument will be provided that selects the JSON field value to decrypt of each object. This field value is expected to be encoded as BASE64. Once decrypted, the same JSON object will be output with the decrypted payload. For example:(the argument format could be similar to those of the existing streambed-confidant's
SsArgs
as per--root-path
, perhaps also with some role id representing the tool).The
-f
argument is as per the abovelogged
tool. In the above, STDIN is used and records are consumed from the commit log and fed intoconfidant
.The
-o
or--output
argument can optionally specify a file to write to, otherwise records will be output to STDOUT.In addition, a namespace option (
--ns
) can be provided, otherwise the namespace is the default namespace.In order to initialise the secret store, a root secret is also required. A
credentials-directory
path can be provided where aroot-secret
file is expected. This argument corresponds conveniently with systemd'sCREDENTIALS_DIRECTORY
environment variable and is used by various services we have written. Themyappservice
in the above example represents the service normally associated with appending to the commit log.Also associated with the
CREDENTIALS_DIRECTORY
is thesecret_id
for role-based authentication with the secret store. This secret is expected to be found in thess-secret-id
file of that directory.A
--select
option selects a field out of the JSON structure. In the case of our example, that field is thevalue
field of a commit log'sProducerRecord
structure (the default for this option).The
secret-path
option specifies the path of the secret to lookup for decryption/encryption. This is application specific, and also highlights a constraint of the tool in that only one secret per topic is catered for. Some applications may have multiple secrets per topic and will then require their own tool for decrypting data.Encryption functionality is achieved by using an
encrypt
command in place of the "decrypt" one above. In this mode, the JSON objects read are assumed to have a field that is decrypted. This field will then become encrypted and output in place as a BASE64 field.