Closed TheQueenIsDead closed 1 year ago
I cloned the project and added a very rudimentary debug line just before the "log format not supported"
line in order to debug the data being processed:
$ git diff
diff --git a/logv2.go b/logv2.go
index 2b25f06..4b04eb1 100644
--- a/logv2.go
+++ b/logv2.go
@@ -192,6 +192,7 @@ func (ptr *Logv2) Analyze(filename string) error {
}
str := string(buf)
if !regexp.MustCompile("^{.*}$").MatchString(str) {
+ fmt.Printf("Log format not supported: %s\n", str)
return errors.New("log format not supported")
}
break
It looks like this error is potentially caused by an excessively long aggregate query log line, here is the redacted output
{
"t":{"$date":"2023-03-27T03:00:46.806+00:00"},
"s":"I",
"c":"COMMAND",
"id":51803,
"ctx":"conn121577",
"msg":"Slow query",
"attr":{
"type":"command",
"ns":"xxx.yyy",
"appName":"xxx",
"command":{
"aggregate":"yyy",
"pipeline":[
{"$match":
{"aaa_id":"8beeda29-**[...]**-c6bce1cf4c1b",
"bbb_id":{"$in":["c9651e43-[...]-5ea13e02ba50","4a
I have formatted the above with indentation that was not present, but importantly it shows that the log line was cut off at "4a
. The value of bbb_id
was a very long array of UUID's
To express how long this line is, copying the new debug line and piping into wc
shows that it is ~ 4097 characters long:
echo '<output>' | wc
1 5 4097
More digging! I whipped up a python script to try find the log line in the main file:
This can be seen here, though it is not important:
What is interesting is that the length of the line that causes issues in hatchet is actually 11350 characters.
It parses as valid JSON, and begins and ends with the correct {
and }
characters.
Delving back into the code I see that the bufio Readline function is used to iterate the file: https://github.com/simagix/hatchet/blob/main/logv2.go#L187
Reading the documentation for that function, it seems that it is possible for the function to return only the beginning of the line, and an isPrefix
boolean is returned to denote this. This is currently unassigned in the code, but when I make the following change locally, I can confirm that this flag is being set in the case of my log:
diff --git a/logv2.go b/logv2.go
index 2b25f06..75e7e77 100644
--- a/logv2.go
+++ b/logv2.go
@@ -130,6 +130,7 @@ type HatchetInfo struct {
func (ptr *Logv2) Analyze(filename string) error {
var err error
var buf []byte
+ var prefix bool
var file *os.File
var reader *bufio.Reader
ptr.filename = filename
@@ -184,9 +185,12 @@ func (ptr *Logv2) Analyze(filename string) error {
}
for { // check if it is in the logv2 format
- if buf, _, err = reader.ReadLine(); err != nil { // 0x0A separator = newline
+ if buf, prefix, err = reader.ReadLine(); err != nil { // 0x0A separator = newline
return errors.New("no valid log format found")
}
+ if prefix {
+ return errors.New("log line too long to parse")
+ }
if len(buf) == 0 {
continue
}
Running in development mode, the output is such:
2023/03/28 20:21:11 simagix/hatchet devel-20230328
2023/03/28 20:21:11 processing test/broken-00-01-mongodb.log
2023/03/28 20:21:11 hatchet name is broken_00_01_mongodb_03980e
2023/03/28 20:21:11 log line too long to parse
I'm not super au fait with IO buffers, so I am uncertain how simple this would be to fix
Hi, thank you for your work on Keyhole and Hatchet, both tools have proven incredibly valuable for me when identify areas of improvement within Mongo.
I have setup a Github actions pipeline to download server logs periodically using the atlas cli:
After downloading the log, I attempt to generate a Hatchet report, and then run the web server to collect data from the api:
However, every so often the hatchet fails to process a log file at random with the following error:
I understand that this is likely an issue with the file generation, or contents of the file itself, meaning that there is a chance that the issue lies in the Atlas CLI. However, I was wondering if there was any way to get more specific information about this error, as running in verbose mode did not yield any extra information.
Any help appreciated if you happen to have ran into this issue before.
Thanks
Versions