Open mans2singh opened 1 year ago
Thanks for filing this @mans2singh! This is something we have discussed internally at times but there wasn't a tracking issue.
There is some standardization as you can see in the logs with error_type
, reason
, and stage
, but we could do more.
Using json formatting for the logs should also make them more structured/parseable ๐
HI @jszwedko could this issue be added in a roadmap? I also think the error
message should be a JSON string so it'll be easier to be parsed, categorized, and analyzed.
A note for the community
Use Cases
Hi:
I am working with vector running in kubernetes and need to filter logs with errors. However, I did not see any pattern to extract error from the logs logs.
I think a structured error message or consistent pattern would help applications to programmatically extract relevant the reason/error.
Here are few examples:
Splunk hec sink ssl error:
2023-02-03T14:51:45.181367Z WARN sink{component_kind="sink" component_id=alerts_sink component_type=splunk_hec_logs component_name=alerts_sink}:request{request_id=1}:http: vector::internal_events::http_client: HTTP error. error=error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1921:: self signed certificate in certificate chain error_type="request_failed" stage="processing" internal_log_rate_limit=true
Http misconfigured url warning:
2023-02-05T11:10:01.814103Z WARN sink{component_kind="sink" component_id=alerts_sink component_type=http component_name=alerts_sink}:request{request_id=0}:http: vector::internal_events::http_client: HTTP error. error=error trying to connect: dns error: failed to lookup address information: Name or service not known error_type="request_failed" stage="processing" internal_log_rate_limit=true 2023-02-05T11:10:01.814154Z WARN sink{component_kind="sink" component_id=alerts_sink component_type=http component_name=alerts_sink}:request{request_id=0}: vector::sinks::util::retries: Retrying after error. error=Failed to make HTTP(S) request: error trying to connect: dns error: failed to lookup address information: Name or service not known internal_log_rate_limit=true
Http redirect error:
2023-02-03T16:04:14.416107Z ERROR sink{component_kind="sink" component_id=alerts_sink component_type=http component_name=alerts_sink}:request{request_id=0}: vector::sinks::util::retries: Not retriable; dropping the request. reason="response status: 301 Moved Permanently" internal_log_rate_limit=true
If there is a pattern or structure to extract the error reason programmatically that I have missed, please let me know.
Thanks
Attempted Solutions
I tried finding a common pattern for errors but did not see any.
Proposal
The error messages can have consistent pattern or fields that can be used programmatically to extract the error reason/code etc. Also, all errors should be logged at that level. In the example above ssl error for splunk hec is logged at warn level but message indicates error:
2023-02-03T14:51:45.181367Z WARN sink{component_kind="sink" component_id=alerts_sink component_type=splunk_hec_logs component_name=alerts_sink}:request{request_id=1}:http: vector::internal_events::http_client: HTTP error. error=error trying to connect: error:1416F086:SSL routines:tls_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1921:: self signed certificate in certificate chain error_type="request_failed" stage="processing" internal_log_rate_limit=true
References
No response
Version
timberio/vector:0.25.1-debian