google / mtail

extract internal monitoring data from application logs for collection in a timeseries database
Apache License 2.0
3.83k stars 378 forks source link

type inference converts "05" to "5" which then isn't usable in strptime() #230

Open sevaTechoPark opened 5 years ago

sevaTechoPark commented 5 years ago
/^/ +
/(?P<year>\d{4})\/(?P<month>\d{2})\/(?P<day>\d{2}) (?P<time>\d{2}:\d{2}:\d{2}) / +
/.*/ +
/$/ {
    getfilename() !~ /(error\.log)$/ {
        stop
    }

    hidden gauge time_rfc3339
    time_rfc3339 = $year + "-" + $month + "-" + $day + "T" + $time + "Z03:00"
    strptime(time_rfc3339, "2006-01-02T15:04:05Z07:00")

    error_log_count++
}

error.log:

2019/05/14 11:10:05 [warn] ...
2019/05/14 11:11:06 [warn] ...

logfile:

I0514 11:13:15.029268   49162 vm.go:90] nginx.mtail: Runtime error: strptime (2006-01-02T15:04:05Z07:00, 2019-5-14T11:11:06Z03:00, UTC) failed: parsing time "2019-5-14T11:13:15Z03:00": month out of range

I think mtail transform 05 to 5 because think that it is number.

How can I tell to mtail that I wanna regexp numbers but wanna work with string type?


time_rfc3339 = $year + "-" + string($month) + "-" + $day + "T" + $time + "Z03:00" does not help me

$ mtail -version

mtail version v3.0.0-rc29-12-gc95b546 git revision c95b546631bf7b6077d3377f25aff2d5306d305f go version go1.12.4 go arch amd64 go os freebsd
jaqx0r commented 5 years ago

I think if you change the regex to match more than digits you won't get the type inference:

/^/ +
/(?P<year>\d{4})\/(?P<month>.{2})\/(?P<day>\d{2}) (?P<time>\d{2}:\d{2}:\d{2}) / +
/.*/ +
/$/

which will leave $month as a string.

This isn't very nice though. Thanks for the report, you're probably right that the type conversion is causing the 0 to be lost, but the context you want to use it in is a string. I wonder if we can make a better strptime.

jaqx0r commented 5 years ago

I was doing some more thinking about this, and while I think you have found an interesting problem, I think a better way of approaching your problem is to not try to reformat the timestamp and just parse it as it appears form the source.

The linked change above (f8cbd48) has an example of how you could write the format string to parse the timestamp in one go. It will mean mtail is faster at processing your logs as a result of not doing so much work.

sevaTechoPark commented 5 years ago

Thank you for your reply, glad that my message was helpful yes, this work for me

sevaTechoPark commented 5 years ago

I was so glad I hurried I stopped getting runtime errors, but after call strptime my programm stop

jaqx0r commented 5 years ago

Please file a new issue if you've discovered a new problem.

On Tue., 28 May 2019, 23:07 Vsevolod, notifications@github.com wrote:

I was so glad I hurried I stopped getting runtime errors, but after call strptime my programm stop

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/mtail/issues/230?email_source=notifications&email_token=AAXFX652IUL4BWVO4DM2QELPXUVARA5CNFSM4HMXOEYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWMB47Q#issuecomment-496508542, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXFX67ID2TE6CE7UDLKHQLPXUVARANCNFSM4HMXOEYA .

sevaTechoPark commented 5 years ago

I still use the same version

/^/ +
/(?P<date>\d{4}\/\d{2}\/\d{2} \d{2}:\d{2}:\d{2}) / +
/\[(?P<log_level>[a-z]+)\] / +
/.*/ +
/$/ {
    getfilename() !~ /(error\.log)$/ {
        stop
    }

    strptime($date, "2006/01/02 15:04:05")

    hidden gauge hack_by_server_name_log_level
    hack_by_server_name_log_level = "${HOSTNAME}" + "/" + $log_level

    error_log_count[hack_by_server_name_log_level]++
}

By the way I have to use hack instead of

gauge error_log_count by server_name, log_level
error_log_count[server_name, log_level]

because this is equivalent to

gauge error_log_count by server_name, log_level
error_log_count[server_name][log_level]

that create extra data

jaqx0r commented 5 years ago

I don't understand the problem. You've changed the program to parse the date as I advised, so there must be something new now. What is happening and what did you expect to happen instead?

On Wed, 29 May 2019 at 18:15, Vsevolod notifications@github.com wrote:

I still use the same version

/^/ + /(?P\d{4}\/\d{2}\/\d{2} \d{2}:\d{2}:\d{2}) / + /[(?P[a-z]+)] / + /.*/ + /$/ { getfilename() !~ /(error.log)$/ { stop }

strptime($date, "2006/01/02 15:04:05")

hidden gauge hack_by_server_name_log_level
hack_by_server_name_log_level = "${HOSTNAME}" + "/" + $log_level

error_log_count[hack_by_server_name_log_level]++

}


By the way I have to use hack instead of

gauge error_log_count by server_name, log_level error_log_count[server_name, log_level]

because this is equivalent to

gauge error_log_count by server_name, log_level error_log_count[server_name][log_level]

that create extra data

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/mtail/issues/230?email_source=notifications&email_token=AAXFX62VSOUQRAYOPKDDBZTPXY3RVA5CNFSM4HMXOEYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWORLZY#issuecomment-496834023, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXFX64DFKX53PPVD2OUK5LPXY3RVANCNFSM4HMXOEYA .

sevaTechoPark commented 5 years ago

after call strptime my programm stop

jaqx0r commented 5 years ago

Please show the logs where this happens.

On Thu, 30 May 2019 at 16:18, Vsevolod notifications@github.com wrote:

after call strptime my programm stop

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/mtail/issues/230?email_source=notifications&email_token=AAXFX6Y2VE6Y7DJOQ4LKNVDPX5WSTA5CNFSM4HMXOEYKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWRPDUY#issuecomment-497218003, or mute the thread https://github.com/notifications/unsubscribe-auth/AAXFX63IMKHFW6UWZYYCTCTPX5WSTANCNFSM4HMXOEYA .