Log aggregation - Githubissues

lfrancke commented 3 years ago

We do want to support collecting and aggregating logs from all our services.

For this we need an "agent" sitting on each machine/next to each process which understands which logs to collect and where to ship them.

For this we need to select a tech stack first (leaning towards the OpenSearch stack as we'd like to include that anyway) and then deploy and configure the collection agents (we were recommended Vector.dev) as well as the supporting infrastructure.

NickLarsenNZ commented 11 months ago

This goes beyond log aggregation (and replaces Vector for the logging part)...

I'm a big proponent of the opentelemetry-collector (and opentelemetry-collector-contrib additions) for all things telemetry.

I had started using as a way to collect traces in OTLP format, then relay them to multiple tracing tools (eg: Grafana Mimir and New Relic).

I then noticed it covers logs and metrics too and seems to be a good all-round vendor neutral solution to telemetry collection and routing.

The idea is that you setup...

receivers for whatever you need to ingest (eg: logs via file, metrics via Prometheus endpoint, traces via OTLP and Zipkin protocols)
exporters for wherever you want logs/metrics/traces to go
processors to enrich the logs/metrics/traces with other data (eg: Kubernetes info).
pipelines to join receivers, processors, and exporters.

Ultimately OTLP implementations (eg: Otel language SDKs) would support all of logs, metrics, and traces, but it's nice to have the option for various receivers, such as filelogreceiver.

There are also an exporter for opensearch.

It could also be useful so that customers can send to their tooling, while also sending some of it to our tools for preemptive monitoring (with redaction) if that becomes an offering.

I was going to bring this up for a future architecture discussion, but it seems worthwhile to comment here.

fhennig commented 11 months ago

I read this and found out that the vector aggregator can also send to open-telemetry as a sink. Just found it interesting, so it could be put on top.

To me the bigger question about logs/traces/metrics is, where are they actually displayed in the end? because if I have metrics in grafana and logs in OpenSearch, do I really need a common aggregator in the middle? Just playing devils advocate here :smiling_imp:

NickLarsenNZ commented 11 months ago

I read this and found out that the vector aggregator can also send to open-telemetry as a sink. Just found it interesting, so it could be put on top.

I'm not super familiar with Vector, but I'll take a look to see the similarities/differences.

To me the bigger question about logs/traces/metrics is, where are they actually displayed in the end? because if I have metrics in grafana and logs in OpenSearch, do I really need a common aggregator in the middle? Just playing devils advocate here 😈

Keep playing devil's advocate, that is important. I think I'll need to show this on a call with some contrived diagrams, as it's quite a large surface area, and a bit of overlap with what we currently do.

stackabletech / issues

Log aggregation #75