[Feature]: Logging architecture and aggregation architecture

Summary 💡

As a user of the platform, I have poor visibility into what is happening in different components of the platform and how these components influence each other, especially operators and the product clusters they manage.

Logs are used to make this information available. Currently, every pod logs to stdout with a certain default log format. For log configuration we support setting a custom log configuration file in some products.

However, logs are not persisted, meaning that a crashed pod is difficult to investigate. They are also not aggregated, so that investigating issues involving multiple pods is difficult. Log configuration is difficult and sometimes not even possible.

Examples 🌈

logging architecture and aggregation architecture
- vector sidecar
- vector aggregator - interchangable by the customer
- vector sink - interchangable by the customer
common "log format"
- timestamp, log level, message
- what else? additional information added by vector?
  - sidecar is probably part of what we offer, can't be modified
- this is our default, but the customer can supply their own log4j file if they want to
common CRD fragment
A log level should be configurable in the product already. I.e. don't log at DEBUG by default and filter later; that's too costly
Log rotation should be used, logs can be discarded after vector has read them.
kubedatastack/logs/* is an emptyDir so it can be shared between containers (product and agent)

Motivation 🔦

Aggregate and persist - The logging on the platform was designed to aggregate logs from all parts of the platform to make it easy to correlate events from different parts. For this, logs should share the same structure, and should be viewable in a central location. Logs should also be persisted in a central location, so if a component crashes, the logs are still there to identify the reason.

Easy to read on the fly - At the same time, logs should still be accessible in an easy to read format on the containers, to allow for easy on the fly inspection of each part of the platform. The logging configuration also supports setting different thresholds for the logs readable on the container and the aggregated logs. This way you can get a detailed view of the operations of a component while viewing it on the container, but aggregate logs at a coarser granularity when aggregating across the whole platform.

Consistent configuration - Finally, logging should be always configured the same way, no matter which product and which underlying technology is used to produce the logs. Logging for each product is configured in the ProductCluster resource. It is still supported to supply custom logging configuration files, these are then product specific.

zncdatadev / trino-operator