vectordotdev / vector

A high-performance observability data pipeline.
https://vector.dev
Mozilla Public License 2.0
17.83k stars 1.58k forks source link

`parse_nginx_log` does not work for error logs #18063

Open fzyzcjy opened 1 year ago

fzyzcjy commented 1 year ago

A note for the community

Problem

Hi thanks for the library! However, it does fail for this message:

2023/07/21 08:35:07 [error] 32#32: *29668326 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 192.168.1.119, server: , request: "GET /what HTTP/1.1", upstream: "http://10.247.225.114:80/what", host: "plusequalone.com"

when doing parse_nginx_log!(.message, "error")

Configuration

No response

Version

latest

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

neuronull commented 1 year ago

Was able to reproduce this.

// example from the docs works:

$ parse_nginx_log!(
s'2021/04/01 13:02:31 [error] 31#31: *1 open() "/usr/share/nginx/html/not-found" failed (2: No such file or directory), client: 172.17.0.1, server: localhost, request: "POST /not-found HTTP/1.1", host: "localhost:8081"',
"error")
{ "cid": 1, "client": "172.17.0.1", "host": "localhost:8081", "message": "open() \"/usr/share/nginx/html/not-found\" failed (2: No such file or directory)", "pid": 31, "request": "POST /not-found HTTP/1.1", "server": "localhost", "severity": "error", "tid": 31, "timestamp": t'2021-04-01T19:02:31Z' }

// The provided example doesn't

$ parse_nginx_log!(
s'2023/07/21 08:35:07 [error] 32#32: *29668326 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 192.168.1.119, server: , request: "GET /what HTTP/1.1", upstream: "http://10.247.225.114:80/what", host: "plusequalone.com"',
"error")
function call error for "parse_nginx_log" at (0:298): failed parsing log line

potentially related? https://github.com/vectordotdev/vector/issues/12900

zamazan4ik commented 1 year ago

I have another concern about everything like parse_nginx_log and other 3rd-party software-oriented stuff. I think the important thing here is specifying a target version for a function. e.g. let's imagine Nginx changes the error log format in the next release. In this case, we need to support both Nginx (at least for some time) because Vector users use both Nginx versions (different software update policies and other stuff, you know).

But we have something like parse_nginx_log! (version...) (just a pseudocode) - the problem will be fixed. Also, there is an option to spawn things like parse_nginx_log_v1, parse_nginx_log_v2, etc but it smells a bit (and mapping between our v1 and the corresponding Nginx versions can be untrivial).

The same belongs to other changes outside the Vector.

dsmith3197 commented 1 year ago

We discussed and will update the regex to be more permissive, making the server field and potential others optional.