streambed / streambed-rs

Event driven services toolkit
Apache License 2.0
32 stars 5 forks source link

Two new tools for streambed-logged #53

Closed huntc closed 4 weeks ago

huntc commented 2 months ago

To avoid each application having to log/export/import data out of/into Streambed Logged, two tools could be provided that takes care of this common requirement. That's the rationale. The remainder of this issue tables some thoughts on their design.

logged

I'm thinking that there could be a tool to import/export from streambed-logged file stores. The tool should use stdin to import JSON per streambed's ConsumerRecord. Assuming the tool is named logged and uses streambed-logged's FileLog and we therefore wish to produce records on the log:

cat /tmp/some-records.json | logged produce --root-path=/var/lib/logged --file=-

(the argument format could be similar to those of the existing streambed-logged's CommitLogArgs as per --root-path, perhaps also with some consumer group representing the tool).

By convention, the --file (or -f) indicates that the tool is to consume records from a file. The - convention is used to indicate STDIN. Alternatively, a file name could be provided for that argument e.g.:

logged produce --root-path=/var/lib/logged -f=/tmp/some-records.json 

Records will be deserialised and then appended to the commit log until an EOF is reached. The payload is assumed to have been encrypted prior to invoking the logged tool. Encryption is dealt with later in this issue via the confidant tool.

For output, a topic name to consume from is required along with the consume command, and is mutually exclusive to --file being provided:

logged subscribe --root-path=/var/lib/logged --topic=mytopic

In addition, a namespace option (--ns) can be provided, otherwise the namespace is the default namespace.

Also by default, records of the topic are consumed and output to STDOUT. A --o (or --output) option can be used to write to a file. Records are output as JSON.

An --offset argument can be provided to declare the offset + 1 to start consuming from (the offset is often used to represent the last offset observed, hence returning the next record). If no offset is provided then consumption starts from the beginning of the log.

A idle_timeout can be provided. This is the amount of time to indicate that no more events are available when consuming from the topic and, if timed out on, will cause the logged tool to return. Otherwise, without this option, the logged tool will wait indefinitely and consume records as they are appended to by any other process.

confidant

The second tool is confidant as per the library it provides access to, and has the ability to encrypt/decrypt data.

Similar to logged, confidant will take a stream of input data, perform decryption given a lookup in the confidant secret store, and then output the stream. A stream of JSON objects are expected as input, and an argument will be provided that selects the JSON field value to decrypt of each object. This field value is expected to be encoded as BASE64. Once decrypted, the same JSON object will be output with the decrypted payload. For example:

logged subscribe --root-path=/var/lib/logged --topic=mytopic | \
confidant decrypt \
  --credentials-directory=/etc/myappservice \
  --root-path=/var/lib/confidant \
  --secret-path="secrets.myappservice-events.key" \
  -f=-

(the argument format could be similar to those of the existing streambed-confidant's SsArgs as per --root-path, perhaps also with some role id representing the tool).

The -f argument is as per the above logged tool. In the above, STDIN is used and records are consumed from the commit log and fed into confidant.

The -o or --output argument can optionally specify a file to write to, otherwise records will be output to STDOUT.

In addition, a namespace option (--ns) can be provided, otherwise the namespace is the default namespace.

In order to initialise the secret store, a root secret is also required. A credentials-directory path can be provided where a root-secret file is expected. This argument corresponds conveniently with systemd's CREDENTIALS_DIRECTORY environment variable and is used by various services we have written. The myappservice in the above example represents the service normally associated with appending to the commit log.

Also associated with the CREDENTIALS_DIRECTORY is the secret_id for role-based authentication with the secret store. This secret is expected to be found in the ss-secret-id file of that directory.

A --select option selects a field out of the JSON structure. In the case of our example, that field is the value field of a commit log's ProducerRecord structure (the default for this option).

The secret-path option specifies the path of the secret to lookup for decryption/encryption. This is application specific, and also highlights a constraint of the tool in that only one secret per topic is catered for. Some applications may have multiple secrets per topic and will then require their own tool for decrypting data.

Encryption functionality is achieved by using an encrypt command in place of the "decrypt" one above. In this mode, the JSON objects read are assumed to have a field that is decrypted. This field will then become encrypted and output in place as a BASE64 field.