Closed djaglowski closed 3 years ago
Hey @djaglowski!
file.*
is ambiguous. Here we'll have to discuss if this is only about log files or if this could be any kind of file and the "role" of the file would depend on the context. If it was a trace attribute for an operation reading from or writing to a file, this would be a different semantic than for a log line that was read from that file.
2a. file.name
and file.path
overlap. Is this intended or should we rather use the directory instead of full path? These details can, however, be best discussed on a PR.3. We don't have any dedicated set of semantic conventions for log attributes yet but I assume it makes sense to have that. What do the other @open-telemetry/technical-committee members think? We'll also have to extend the semconv generator support to metrics anyway.
I think we need to have one "semantic conventions" set of documents and label each individual semantic convention with up to 4 possible labels which indicate which signal/data type the convention is applicable to: resource
, traces
, metrics
, logs
. As of today most of logs
conventions also apply to traces
and vice versa.
2.
file.*
is ambiguous. Here we'll have to discuss if this is only about log files or if this could be any kind of file and the "role" of the file would depend on the context.
Can we come up with a better name? FYI, Elastic Common Schema has a file.*
namespace https://www.elastic.co/guide/en/ecs/current/ecs-file.html#ecs-file that has things like path
, name
, target_path
which seem to serve a similar purpose.
If it was a trace attribute for an operation reading from or writing to a file, this would be a different semantic than for a log line that was read from that file.
Why would it be a different semantic? For example if I do some sort of file processing and want to report it as a span wouldn't it be a good fit to specify file.name
as an attribute of the span?
If these file conventions can be designed to be used in span and log attributes with the same semantics, that would be ideal IMHO, and the using the file namespace seems best.
file.*
is ambiguous. Here we'll have to discuss if this is only about log files or if this could be any kind of file and the "role" of the file would depend on the context.Can we come up with a better name? FYI, Elastic Common Schema has a
file.*
namespace https://www.elastic.co/guide/en/ecs/current/ecs-file.html#ecs-file that has things likepath
,name
,target_path
which seem to serve a similar purpose.If it was a trace attribute for an operation reading from or writing to a file, this would be a different semantic than for a log line that was read from that file.
Why would it be a different semantic? For example if I do some sort of file processing and want to report it as a span wouldn't it be a good fit to specify
file.name
as an attribute of the span?
@arminru can you please comment on this ^^^?
@tigrannajaryan
Can we come up with a better name? FYI, Elastic Common Schema has a file.* namespace elastic.co/guide/en/ecs/current/ecs-file.html#ecs-file that has things like path, name, target_path which seem to serve a similar purpose.
Why would it be a different semantic? For example if I do some sort of file processing and want to report it as a span wouldn't it be a good fit to specify file.name as an attribute of the span?
If we want to add this as a generic attribute which always describes the file about which a certain span, log or metric is about, then file.*
should be fine indeed.
I thought that we might want to distinguish between a file being operated on and the log file from which a log is coming from. If we, for example, had a structured log message about a failed file read operation and we extracted this log message from a log file, then we'd have two files in question and it would be ambiguous which one is described by the file.*
attribute. Hence I thought we should probably add a separate attribute dedicated to a log file as source of a log record. WDYT?
Based on discussion in the Spec SIG, I'm suggesting the following approach:
First, the immediate problem can be considered as a logs-specific attribute set. This would mean prefixing the above attributes, such that we would have:
logsource.file.name
- The basename of the file (i.e. mylog.log).logsource.file.path
- The absolute path of the file (i.e. /var/log/mylog.log).logsource.file.name.resolved
- Same as file.name, but with symlinks resolved.logsource.file.path.resolved
- Same as file.name, but with symlinks resolved.logsource.file.stream
- When relevant, stdout or stderr.
The choice of logsource
is an initial proposal, but of course is open to feedback.Second, as a possible separate proposal, the notion of establishing a "structured value" should be explored. The general idea would be that a common structure could be established that is reusable in multiple contexts within the project's semantic conventions. In the context of a log source, the semantic convention would establish that the logsource
attribute should have a structured value that is essentially:
fileType {
file.name: string,
file.path: string,
file.name.resolved: string,
file.path.resolved: string,
file.stream: string,
}
A mechanism for defining and referring to structured values would then facilitate commonality across the semantic conventions by allowing reuse. Referencing this type in a specification would then logically produce a corresponding set of attributes. Logically:
some_context: fileType
would effectively define
some_context.file.name
some_context.file.path
some_context.file.name.resolved
some_context.file.path.resolved
some_context.file.stream
With this second point in mind, the logs-specific attribute set would ideally be defined in such as way that it is broadly applicable, such that an eventual structured value for describing a file could replace the initial logs-specific attributes in a non-breaking way. Of course this cannot be guaranteed, but we can make a point to consider whether these attributes would be broadly useful. I am satisfied that they are, but I'm calling this out in case anyone has further thoughts on this before we move forward.
Second, as a possible separate proposal, the notion of establishing a "structured value" should be explored.
This likely requires an OTEP since it has significant implications (the Trace API currently disallows structured values, except the most simple ones - homogeneous arrays).
I may have misunderstood what you wrote. Do you suggest that we do this (in JS notation for simplicity):
attributes["some_context"] = {
"file.name": "abc",
"file.path": "/var/lib/abc",
...
}
or this:
attributes["some_context.file.name"] = "abc"
attributes["some_context.file.path"] = "/var/lib/abc"
...
@tigrannajaryan I was imagining the latter, but really only because the existing conventions use string keys, as far as I've seen.
Having thought about it more (and a good thing - see below), I am even more in favor of the string-only approach. The rules for attribute naming, specifically the third and fourth, ensure that the two formats are effectively interchangeable. If I'm correct on that point then it seems we might as well decide based on simplicity and compatibility.
I would also say that I'm less convinced of the need for any kind of code-level support of structured values, though I still think it would be good to formalize some notion of what I would call "relative namespaces" (e.g. .file
).
I'm really glad you asked this because thinking about the extent to which the two representations are almost interchangeable w/o any rules helped me recognize that my own proposal seemed to demonstrate a case in which they are not. Then I found the attribute naming rules and particularly that Names SHOULD NOT coincide with namespaces.
Adjusting for this, I now propose the following strings:
logsource.file.name
logsource.file.path
logsource.file.name_resolved
logsource.file.path_resolved
logsource.file.stream
The proposal in your last comment sounds great to me, @djaglowski. Thanks for iterating on this. Do you intend to go ahead with a PR to add these attributes?
@arminru, I'll make the PR
When consuming logs from a file, information about the file is commonly included as metadata on the log record.
As a starting point for discussion, I propose:
file.*
namespace:file.name
- The basename of the file (i.e.mylog.log
).file.path
- The absolute path of the file (i.e./var/log/mylog.log
).file.name.resolved
- Same asfile.name
, but with symlinks resolved.file.path.resolved
- Same asfile.name
, but with symlinks resolved.file.stream
- When relevant,stdout
orstderr
.