open-telemetry / opentelemetry-collector-contrib

Contrib repository for the OpenTelemetry Collector
https://opentelemetry.io
Apache License 2.0
3.02k stars 2.34k forks source link

Add file log exporter #6316

Closed hypnoce closed 10 months ago

hypnoce commented 2 years ago

Is your feature request related to a problem? Please describe. The current file exporter/receiver lacks many features that can make it usable in more scenarios than debug :

Further reading : https://github.com/open-telemetry/opentelemetry-collector-contrib/discussions/4997

Describe the solution you'd like A file exporter flexible enough to cover many scenarios

Describe alternatives you've considered Creating a simple filelog exporter https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/6306. No other solution worked in the otel ecosystem.

Requirements of file exporter

I would decouple the file format specification in another issue. A formal, general, replayable file format/encoding can be used in other contexts like export to other storage like S3, kafka...

tigrannajaryan commented 2 years ago

Let's make sure we approach this from the perspective of the actual real-world use cases.

mirroring a log directory on another machine/pod (dynamic file names and counts)

This does not seem to be a Collector job. There are other tools that can do this (e.g. rsync), not clear why it needs to be a Collector feature.

ability to write all types (logs, metrics and spans) of telemetry data on disk and readback in scenarios like FileExporter file separation #5008 where logs are shipped to another location and ingested back into another OTEL pipeline. Ability to archive all telemetry data on disk

Both of these use-cases likely can be served with one standardized file format. These do not appear to require the file format to be customizable by the end user in any way. In fact it may be preferable that that the file format is not customizable so that mistakes are not possible. I believe if we want a standard file format then it is best to define it as part of the specification, for which we have an open issue https://github.com/open-telemetry/opentelemetry-specification/issues/1443

each file path can be constructed based on context from the resource and the record (log record, span, metric) being written. This context can be extracted from the attributes. Or a single can be statically defined per exporter.

I do not see which of the listed use-cases require this. Perhaps add a use-case to justify or remove the requirement.

hypnoce commented 2 years ago

This does not seem to be a Collector job. There are other tools that can do this (e.g. rsync), not clear why it needs to be a Collector feature.

Rsync can indeed work but with lots of drawbacks (discovery of new dir/files not supported, tailing not supported...). Some people have suggested

while true; do 
  inotifywait -r -e modify,create,delete /directory
  rsync -avz /directory /target
done

which adds big overhead when log files are often updated.

I still believe that it's a valid use case of a log collector, as it collects logs, ships them, and route them to files with configurable format. Multiplying log collection tooling and adding many side cars in a pod can increase operation complexity as well as resource requirements. It's an actual use case that I currently face. For instance, using fluentbit tail, the file output using pattern and this formatter, I can construct a pipeline that mirrors a log directory. Using the same technology (fluentd) I can build more sophisticated pipelines thus serving many use cases.

I do not see which of the listed use-cases require this. Perhaps add a use-case to justify or remove the requirement.

It was in the case of the first use case where the target file name and location could not be determined at pipeline creation time. Another use case is to be able to write to different file/location based on the kubernetes pod_name that produced the data.

I believe if we want a standard file format then it is best to define it as part of the specification, for which we have an open issue

Agree.

hypnoce commented 2 years ago

@tigrannajaryan is there something I'm missing about rsync for my use case ?

tigrannajaryan commented 2 years ago

@hypnoce there is currently https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/7840 and https://github.com/open-telemetry/opentelemetry-specification/pull/2235 in progress, which may help to cover the use cases that you have. The Collector is about a receiver, but once the JSON format is standardized it should be easy to make an argument that we can also have an exporter of the same format. Review those 2 PRs, comment on them.

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

hypnoce commented 1 year ago

Hey all, my use case is a bit different. I need to be able to write logs in a configurable format in dynamic files based on resource attributes. Like output.info, output.warn and output.error files based on the severity with only the faulty line. Like this operator but having the filename be a go template as well : https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/pkg/stanza/docs/operators/file_output.md WDYT ? Thanks

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

remram44 commented 1 year ago

Please don't close it. Nice bot :sweat_smile:

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

remram44 commented 1 year ago

Feature is still wanted

github-actions[bot] commented 1 year ago

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

github-actions[bot] commented 10 months ago

This issue has been closed as inactive because it has been stale for 120 days with no activity.