Closed captainfalcon23 closed 3 months ago
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
@captainfalcon23 I think the essence of the problem is to export logs to different paths through specific attributes, thus achieving log isolation. To solve this problem, can we use the existing routingprocessor
? The routingprocessor
reads specific resource attributes and exports the logs to different fileexporters based on the attribute values.
So I am starting to look at this, but still don't think it will meet the requirement. Let's say I get logs from a container with the following details: Container Name: zabbix-agent-1234567 Pod Name: zabbix-agent Namespace: foo Cluster (static value): ABC
In this case, I want to save logs to a location like: /mnt/data/logs/$Cluster/$Namespace/$PodName/$ContainerName.log
If I use routing processor, I think I still need to hard code values, and I have to know exactly where my pods are running and if there's some change, I need to manually update this configuration.
Another example is the host logs (e.g. /var/log/messages. I want these to go to a file which is dynamically named "$hostname.log". But again, it seems impossible?
What's your thoughts @atingchen ?
If we need to implement dynamic file export paths, I think we need to follow these three steps:
We need to add a subdirectory_attribute
in the configuration. It will be used to generate the export path.
path: /data
subdirectory_attribute:
- cluster
- namespace
We need to maintain a mapping between the file export path and the io.writer
.
io.writer
.Before exporting telemetry data each time, obtaining io.writer
based on the file export path and exporting it.
If we implement it this way, the existing code needs to be significantly modified. I'd like to hear your thoughts on this. @atoulme @djaglowski Thank you.
There does seem to be some valuable utility here but it would certainly add a lot of complexity and responsibility to the component. I'm not opposed to it in theory but don't think I can make it a priority to help with it.
No worries... I am actually about to start looking at some other solution as I can't get my use-case to work as it stands.
Same way, I would use the routingprocessor and limit this to a set of known paths.
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Happy for these to be closed if you guys like. I moved to another, more flexible solution (vector.dev)
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Hi!
I also need this, and I would like to implement this.
I'm thinking of only supporting a single resource level attribute for the subdirectory_attribute. Multiple attributes can be combined with the transform processor, attributes can be moved to the Resource level using the groupbyattrs processor.
My plan for the implementation (described for logs, but same goes for traces and metrics):
For the configuration, I would add:
group_by_attribute:
subdirectory_attribute: "my_attribute" # (or maybe subpath_attribute instead, and allow '/' in it)
max_open_files: 1000 # (this would set the size of the LRU cache)
What do you think? @atingchen @atoulme @djaglowski
This feature would be interesting, we have a use-case where we save the application data in file to then send to coldstorage, in case, we need the raw logs in the future. We save the files per application name which it's a field in the log, we are achieving that at the moment with logstash, but we are making the effort to move the logging stack to otel.
It seems there is ongoing interest in this feature and someone willing to implement it with a detailed proposal. (Thanks for stepping forward @adam-kiss-sg.) I'm in favor of this and will be happy to review PRs.
The most immediate question in my mind is the config. Looking at the proposal, a few quick thoughts and a longer discussion:
group_by_attribute:
subdirectory_attribute: "my_attribute" # (or maybe subpath_attribute instead, and allow '/' in it)
max_open_files: 1000 # (this would set the size of the LRU cache)
group_by
is sufficient.enabled
flag. I can't seem to find the previous discussion on this but if I recall correctly it is beneficial to helm charts and other tools which may have trouble distinguishing between nil and empty structs when rendering config.Defining the dynamic element of the path is the trickiest part of this proposal. One challenge I have not seen addressed is that our current path
setting is used to specify not only the directory but the actual file name. e.g. path: my_dir/my_file.json
However, if we introduce a subdirectory_attribute
, the user should still be able to specify a file name as well. e.g. The following doesn't quite get us there:
# my_dir/${my.attr}/????
path: my_dir # but no file name
group_by:
subdirectory_attribute: my.attr
One way to address this is to add another setting within group_by
to define the file name.
# my_dir/${my.attr}/my_file.json
path: my_dir # but no file name
group_by:
subdirectory_attribute: my.attr
filename: my_file.json
However, this seems awkward because it differs so much from the way path
currently works. It's also not very flexible. Why not allow the file name to be the dynamic part?
To that end, when group_by
is enabled, the path could be required to contain exactly one *
(or equivalent indicator), which will be replaced by the attribute value. This keeps the file name responsibility in the path
setting and allows more flexibility to the user. subdirectory_attribute
would not be the right term in this case, so perhaps just attribute
or replace_attribute
.
# always my_dir/my_file.json
file/static:
path: my_dir/my_file.json
# my_dir/foo/my_file.json
# my_dir/bar/my_file.json
file/dynamic_subdir:
path: my_dir/*/my_file.json
group_by:
enabled: true
replace_attribute: my.attr
# my_dir/foo/my_nested_dir/my_file.json
# my_dir/bar/my_nested_dir/my_file.json
file/dynamic_subdir_with_nested_subdirs:
path: my_dir/*/my_nested_dir/my_file.json
group_by:
enabled: true
replace_attribute: my.attr
# my_dir/foo.json
# my_dir/bar.json
file/dynamic_filename:
path: my_dir/*.json
group_by:
enabled: true
replace_attribute: my.attr
WDYT?
Hi @djaglowski !
Didn't see your comment before I opened the PR.
Current config values in the pr:
path
config value. When this value is set, rotation setting is ignored.sub_path_resource_attribute
config value is removed from the telemetry data before writing it to a file.
sub_path_resource_attribute
, the telemetry data is discarded.
sub_path_resource_attribute
, and discard_if_attribute_not_found
is set to false. If discard_if_attribute_not_found
is set to true, this setting is ignored.auto_create_directories: [default: true] when enabled, if the directory of the destination file does not exist, will create the directory. If set to false and the directory does not exists, the write will fail and return an error.
I like your suggestion about the *
in the path (instead of my original idea to use path as prefix).
I also have some questions about implementation, but I will ask them in the pr instead.
@adam-kiss-sg, let's keep the conversation here until we agree on the design. I appreciate the many edge cases you are thinking of but I suggest we should identify a smaller scope of changes which can be accepted first. We can still anticipate other changes may be made later.
sub_path_resource_attribute
It's certainly easier to restrict this to resource attributes, so let's start there. When discussing earlier it seemed likely we may want to group by log record attribute, but I think we can add a separate field later if necessary and just make them mutually exclusive. As far as naming, I think resource_attribute
is sufficient.
delete_sub_path_resource_attribute
We should leave this out of the initial implementation, though I'm not opposed to adding it later. We can work out details on a later PR.
max_open_files
👍
discard_if_attribute_not_found
&default_sub_path
These can be left out of the initial PR and instead we can just choose a reasonable default behavior. I suggest initially we just log a warning and drop the data. Assuming we eventually allow alternate behaviors, I think we can combine these settings, but again, would prefer to work those details out later.
auto_create_directories
We can consider this in a later PR as well. In the meantime, users who require strict security here should not enable group_by
functionality.
Pulling in the other design questions from the PR:
Error handling
- when creating a file fails because of permission issue
- when creating a file fails because the parent directory doesn't exists, and auto_create_directories is set to false
- when creating a directory fails
- when closing a file fails I'm mostly concerned about cases where a batch of telemetry data contains multiple resources, and some of them are written successfully, while others fail.
I think all four cases should log an error. In the first three, the failure means we can't write, so we should continue to the next resource. The last case isn't really actionable though, so just logging an error is sufficient.
Generally speaking, these kinds of failures will typically result from the same cause and will require user action, so it should be enough to make noise when things are working. We should continue running and writing what we can.
Data modification
Should the exporter modify the telemetry data? I use ResourcesX.MoveTo() in my code, because it showed much better performance compared to CopyTo. There is also the delete_sub_path_resource_attribute config which would remove an attribute from the resource. If a pipeline contains multiple exporters, would this modify the data for the other exporters as well? What's the best practice here?
We have a way of handling this, by indicating in the factory setup that the component will mutate data. Example. It's fairly uncommon for exporters to do this but it is supported and reasonable if it improves performance. That said, there is a performance downside to consider as well, specifically that by setting this flag we're just pushing the copy upstream of the exporter. I'd lean towards setting the flag to keep things simple.
Rotation
I disabled rotation when group_by is active, mainly because I'm concerned about the interaction between rotation (using lumberjack) and the group_by functionality. You can also implement some kind of rotation with group_by by adding a date to the resource attribute, so this didn't seem so much of an issue (although it's time based, not size based, so not exactly the same functionality). What do you think about this?
We should be able to interoperate with the existing rotation configuration. The buildFileWriter
does most of what we need. It looks like we just need to pass in the file path instead of pulling it from the config. Rotation should then be handled automatically for each base file.
README.md
I noticed this line in the README.md, should I remove it?
This intended for primarily for debugging Collector without setting up backends.
I think it's reasonable to remove. It's probably still the most common use case, but I don't see any reason we should constrain it for users.
sub_path_resource_attribute
It's certainly easier to restrict this to resource attributes, so let's start there. When discussing earlier it seemed likely we may want to group by log record attribute, but I think we can add a separate field later if necessary and just make them mutually exclusive. As far as naming, I think resource_attribute is sufficient.
:+1:
delete_sub_path_resource_attribute
We should leave this out of the initial implementation, though I'm not opposed to adding it later. We can work out details on a later PR.
:+1:
discard_if_attribute_not_found & default_sub_path
These can be left out of the initial PR and instead we can just choose a reasonable default behavior. I suggest initially we just log a warning and drop the data. Assuming we eventually allow alternate behaviors, I think we can combine these settings, but again, would prefer to work those details out later.
Again, I understand the benefit of keeping this out of the configuration at first, but I'm not sure logging a warning for every dropped resource is a good idea. It can easily fill up the collector's log and cause issues if the disk gets full (this can easily happen in a containerized environment). I would suggest either dropping these logs silently (or with debug level), or restricting the warning to 1/hour, or going with a different approach (eg: handling this the same as if the attribute contained an empty string).
auto_create_directories
We can consider this in a later PR as well. In the meantime, users who require strict security here should not enable group_by functionality.
:+1:
Data modification
...
We have a way of handling this, by indicating in the factory setup that the component will mutate data. Example. It's fairly uncommon for exporters to do this but it is supported and reasonable if it improves performance. That said, there is a performance downside to consider as well, specifically that by setting this flag we're just pushing the copy upstream of the exporter. I'd lean towards setting the flag to keep things simple.
With the delete_sub_path_resource_attribute gone this is really just a question of using CopyTo vs MoveTo, so I would stick to using CopyTo in the exporter instead and not use the flag (there is no performance gain if the upstream just does the copy instead). It's good to know about the option though :)
Rotation
...
We should be able to interoperate with the existing rotation configuration. The buildFileWriter does most of what we need. It looks like we just need to pass in the file path instead of pulling it from the config. Rotation should then be handled automatically for each base file.
Yes, on the coding side this would be simple. But I have concerns about the interaction between group_by (which handles multiple files) and rotation (which also handles multiple files). Originally this was just a feeling but now I looked into the lumberjack code and collected some risks:
On the other hand, I don't see much real use-case for using group_by and rotation together. If you have many files with group_by, it's unlikely you need to rotate them. If you have just a handful of files, you don't really need group_by, you could simply use a routing connector with multiple fileexporters.
I simply think the cons outweighs the benefits on this one, I think it would add an unnecessary runtime complexity.
README.md
I noticed this line in the README.md, should I remove it? This intended for primarily for debugging Collector without setting up backends.
I think it's reasonable to remove. It's probably still the most common use case, but I don't see any reason we should constrain it for users.
:+1:
Ps: I created the pr because I need this functionality in a project I'm working on asap, and I plan to use it from the fork until we figure out the quirks. I'm more than happy to take it step by step, creating multiple prs if necessary, I will just keep using my fork in the mean time.
Again, I understand the benefit of keeping this out of the configuration at first, but I'm not sure logging a warning for every dropped resource is a good idea. It can easily fill up the collector's log and cause issues if the disk gets full (this can easily happen in a containerized environment). I would suggest either dropping these logs silently (or with debug level), or restricting the warning to 1/hour, or going with a different approach (eg: handling this the same as if the attribute contained an empty string).
I'd rather we use debug log than get involved in rate limiting, etc. That is a problem for overall collector telemetry configuration.
Lumberjack is https://github.com/natefinch/lumberjack/issues/56 - this is not an issue without the group_by because there is only one lumberjack.Logger instance, but the group_by would potentially create and close many Logger instances.
This does appear to be a serious issue (though confusingly there are three PRs to fix it and no movement on it). In any case, I think this probably justifies holding off on the change for now. I'm including my thoughts on the other points anyways because if this is ever resolved I think we should have a clear path forward.
The max_days rotation setting won't work reliably, because by the time lumberjack would remove the old files, there could be no lumberjack Logger running with that filename.
If there's no logger with that filename, then we're not writing to the log and there's no need for rotation or cleanup. This doesn't seem like a problem to me.
When lumberjack checks files for rotation, it lists all files in the directory, which can be a real performance issue with tens of thousands of files in the same directory (which is easily possible with group_by accumulating files over weeks or months). This function is called when a new Logger is started, although in a separate gorutine.
The only case where this matters is if the user chooses to put all files in one directory. e.g. my_dir/*.json
instead of my_dir/*/my_file.json
. I think it's enough to document it as a performance optimization.
On the other hand, I don't see much real use-case for using group_by and rotation together. If you have many files with group_by, it's unlikely you need to rotate them. If you have just a handful of files, you don't really need group_by, you could simply use a routing connector with multiple fileexporters.
I'm much less sure that this is such an edge case. Anyone currently relying on rotation is one simple step away from taking advantage of the new feature and still expecting rotation. e.g. If you need rotation for N logs/second, then choose to split the log stream into two, you would still want rotation for N/2 logs/second.
@djaglowski I added a first pr: it contains no changes to the behavior, it only contains the restructure needed to add the group_by functionality. As a reference I will keep my original pr around.
Thanks for merging my first pr @djaglowski . I will close my original pr and create a new, clean one soon.
There is one last question that came up as I was starting to use my fork in our system: the fileexporter currently creates new files with 600 permission, but i need at least 640 for my use case. What do you think would be the best approach?
644 seems reasonable to me but we may need to make it configurable at some point. Maybe best to treat it as a separate issue. That discussion could happen in parallel and we can pick it up in your PR if it's settled.
Given that https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/31396 is merged, ok to close this issue?
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers
. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.
Pinging code owners:
See Adding Labels via Comments if you do not have permissions to add labels yourself.
Component(s)
exporter/file
Is your feature request related to a problem? Please describe.
I want to send our kubernetes pod logs to an opentelementry server and then write them to disk to log files based on attributes within the logs. For example: /data/logs/NAMESPACE_NAME/POD_NAME/CONTAINER_NAME/out.log
Describe the solution you'd like
As above, allow the file exporter to be able to read attributes of the logs and dynamically generate the log location.
Describe alternatives you've considered
No response
Additional context
No response