Log timestamps in l2met

wolfeidau commented 11 years ago

There seems to be an issue with something in the router messages which is triggering a panic.

Because I have no timestamps in my log it is difficult to correlate with the raw syslog data I am collecting.

I understand this is not necessary if deployed on Heroku, would nice to add a flag to enable it for development and where not hosted there.

Happy to submit a pull request if your conducive to the idea.

Also here is the error message logged.

measure.l2met.http-receiver=0ms
panic: runtime error: slice bounds out of range

goroutine 53027 [running]:
l2met/bucket.(*logTuple).Units(0xc20090dc90, 0xc20090f7d0, 0xe)
        /home/builder/Work/go/src/l2met/bucket/log_tuple.go:64 +0xb8
l2met/bucket.parseHkRouter(0xc20045ed20, 0xc2006908c0, 0xc200449120, 0xc200a80000, 0xd, ...)
        /home/builder/Work/go/src/l2met/bucket/parser.go:74 +0x24e
l2met/bucket.func·001(0xc20045ed20)
        /home/builder/Work/go/src/l2met/bucket/parser.go:33 +0x1b1
created by l2met/bucket.NewBuckets
        /home/builder/Work/go/src/l2met/bucket/parser.go:37 +0x12b

ryandotsmith commented 11 years ago

Yikes. Would love to know what is causing the panic for sure!

I would love to accept a PR that improves l2met's own visibility. Maybe there is something in the log pkg that we can use.

wolfeidau commented 11 years ago

Just an update, runs for a few minutes on production data before crashing.

Using the log package I added timestamps to the log and got a second to check router messages within, this resulted in between 4 - 20 candidates.

Ran like 40 message strings through the receiver unit test and couldn't get it to die.

So I commented out this line https://github.com/ryandotsmith/l2met/blob/master/bucket/parser.go#L72 and it has been running for about 6hrs.

Will dig into the log package and see if there is a nice way of capturing the tuple with it's state and dumping this on a panic then do some tests with parallel file of log data.

wolfeidau commented 11 years ago

So did some research, one obvious issue so far is the string convert of the Float64 only caters 10 digits, when your handling values which are in bytes and milliseconds this can overflow.

Don't ask me how but we have clients on very slow links uploading files for a very long time.

Either way I have had a shot at rewriting that logic and removed the parse/format/compare/slice with a TrimFunc, once I am happy with the code I will make a pull request for you to review. Code I am talking about is here.

https://github.com/ryandotsmith/l2met/blob/master/bucket/log_tuple.go#L56-L65

Note this also marginally faster.

ryandotsmith commented 11 years ago

:clap:
Good find. Also, I like the idea of using TrimFunc.

ryandotsmith commented 11 years ago

Hello, I wanted to follow up on this. A few things have changed with respect to internally logged metrics. You can read about the metchan in #82, but I thought it would be nice if I pointed out that we can easily accommodate the issue you have raised here with metchan. Currently all logging output happens here. We might be able to use a better logging facility that would support logging timestamps.

wolfeidau commented 11 years ago

Ah great to hear, so far I haven't experienced this failure with my fork which includes the changes I pushed to you.

I need to re-sync up with your tree and catch up on the changes looks like lots of things are happening.

I will take a look at how this can be extended to cater for local logging.

ryandotsmith / l2met

Log timestamps in l2met #76