Open jakirpatel opened 5 years ago
rc16 seems pretty old. Can you run rc29 (just released today) and see if you can reproduce the problem?
Also, please give the full commandline you used (one of the first log messages in the mtail INFO log.)
@jaqx0r Still not getting the metrics. Version: Build: mtail version v3.0.0-rc29 git revision b5aa95d10fb3af7b843e3814bc65aedd983425d9 go version go1.11.6 go arch amd64 go os linux
The command line:
I0325 11:23:58.965017 32679 main.go:95] Commandline: ["/home/weblogic/test/mtail" "-poll_interval" "1s" "-port" "3102" "-progs" "/home/weblogic/test/old.mtail" "-logs" "/home/weblogic/test/memory-status.txt" "-logtostderr"]
There are no errors while reading. Even I can see X number of lines read. But with the metrics I cant see any counter incremented.
Some more interesting result I got.
Commandline: ["/home/weblogic/test/mtail" "-poll_interval" "1s" "-port" "3102" "-progs" "/home/weblogic/test/old.mtail" "-logs" "/home/weblogic/test/memory-status.txt" "-logtostderr" "-v" "2"]
While logs get written I got following log from mtail:
I0325 11:43:09.947934 21234 log_watcher.go:145] Update fi
I0325 11:43:09.947940 21234 log_watcher.go:130] stat
I0325 11:43:09.949704 21234 log_watcher.go:170] sending update for /home/weblogic/test/memory-status.txt
I0325 11:43:09.949731 21234 log_watcher.go:174] No modtime change for /home/weblogic/test/mtail, no send
I0325 11:43:09.949739 21234 log_watcher.go:174] No modtime change for /home/weblogic/test/old.mtail, no send
I0325 11:43:09.949744 21234 log_watcher.go:145] Update fi
I0325 11:43:09.949749 21234 log_watcher.go:130] stat
I0325 11:43:09.949766 21234 log_watcher.go:145] Update fi
I0325 11:43:09.949778 21234 tail.go:284] Event type watcher.Event{Op:2, Pathname:"/home/weblogic/test/memory-status.txt"}
I0325 11:43:09.949799 21234 tail.go:188] handleLogUpdate /home/weblogic/test/memory-status.txt
I0325 11:43:09.949819 21234 file.go:139] Path /home/weblogic/test/memory-status.txt already being watched, and inode not changed.
I0325 11:43:09.949825 21234 file.go:143] doing the normal read
I0325 11:43:09.949843 21234 file.go:171] /home/weblogic/test/memory-status.txt: file type does not support deadline
I0325 11:43:09.951858 21234 file.go:174] Read count 4096 err <nil>
I0325 11:43:09.951906 21234 file.go:221] Line sent
I0325 11:43:09.951932 21234 file.go:221] Line sent
I0325 11:43:09.952095 21234 file.go:221] Line sent
I0325 11:43:09.952110 21234 file.go:221] Line sent
I0325 11:43:09.952122 21234 file.go:221] Line sent
I0325 11:43:09.952209 21234 file.go:221] Line sent
I0325 11:43:09.952235 21234 file.go:221] Line sent
I0325 11:43:09.952252 21234 file.go:221] Line sent
I0325 11:43:09.952338 21234 file.go:221] Line sent
I0325 11:43:09.952363 21234 file.go:221] Line sent
I0325 11:43:09.952379 21234 file.go:221] Line sent
I forgot to mention that /home/weblogic/test/memory-status.txt is a symlink to other file.
After looking into the problem more close seems it maybe the issue of NFS file system. ?? I think enabling the soft mount option will help on this.
For NFS mount I am using following options:
(rw,relatime,vers=3,rsize=65536,wsize=65536,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=100.66.81.33,mountvers=3,mountport=635,mountproto=udp,local_lock=none,addr=x.x.x.x)
Some of references I got : https://www.reddit.com/r/golang/comments/8npwin/how_to_implement_timeout_in_file_reads/
@jaqx0r
I have applied the soft option while mounting the NFS. But seems the problem of reading the files still persist. Is there any recommendation that how the NFS filesystem should be mounted to read the files through mtail ?
Your inputs will be helpful.
I don't think I can at this time, it's not clear to me yet what the problem is. My guess so far is that the files are not being read after polling. I don't believe at this point that changing NFS mount options will fix it.
On Mon., 25 Mar. 2019, 19:01 Jakir Patel, notifications@github.com wrote:
@jaqx0r https://github.com/jaqx0r
I have applied the soft option while mounting the NFS. But seems the problem of reading the files still persist. Is there any recommendation that how the NFS filesystem should be mounted to read the files through mtail ?
Your inputs will be helpful.
— You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub https://github.com/google/mtail/issues/221#issuecomment-476091632, or mute the thread https://github.com/notifications/unsubscribe-auth/AC5b-zLv5c6N06BRgh4QLpc92TjSs8Znks5vaIJlgaJpZM4cF6z- .
@jaqx0r Might be regenerating the problem will be more good.
@jaqx0r
Seems it was the regex issue. For somehow the regex was not matching. Please resolve it. Thank you for your support.
There was japanese character in the log timestamp. However with the regex on my local machine I was able to see it as ?
I think I could have tested the regex before direct with the logfile.
Um.. what regex issue?
On Tue, 26 Mar 2019 at 20:13, Jakir Patel notifications@github.com wrote:
@jaqx0r https://github.com/jaqx0r
Seems it was the regex issue. For somehow the regex was not matching. Please resolve it. Thank you for your support.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/mtail/issues/221#issuecomment-476536382, or mute the thread https://github.com/notifications/unsubscribe-auth/AC5b-7yYV-pUN-cFGWpzvukrDPCL8ezYks5vaeS8gaJpZM4cF6z- .
@jaqx0r
Basically my regex was not matching with log format. because of that it was not passing from regex block. That was reason my all counters never incremented. However when I have added simple line count counter it got incremented (with no regex).
What I think mtail should provide the info of NO MATCHING lines with regex as well. Some counter maybe so at first place we ll get to know how many lines get matched with regex.
But I observed one thing in the logs I can still see this,
I0325 11:43:09.949843 21234 file.go:171] /home/weblogic/test/memory-status.txt: file type does not support deadline
I0325 11:43:09.951858 21234 file.go:174] Read count 4096 err <nil>
https://github.com/google/mtail/blob/master/docs/Testing.md describes one way to test your programs on snippets of logs. It does require making changes to the code to set it up, however but lets you set expectations on what the counters should be after processing a piece of log with your program.
In the mtail
language, the otherwise
keyword can be used to catch logs that "would otherwise not match" anything in the same block: https://github.com/google/mtail/blob/master/docs/Language.md#otherwise-clauses
(the else
keyword is similar)
You could use otherwise
to increment your own counter when no lines match.
In the olden days, mtail
did have a special exported counter and a ring buffer per program for debugging log lines that didn't match. The ring buffer turned out to be expensive, and we'd made the system spammy to encourage people to try to handle every single log line for coverage, but that caused programs to become more complex and computationally expensive, so I didn't include it in the open sourcing of mtail
. But it is easy to add a counter back for counting the number of times each program didn't take any action at all on a line, so I will do that.
Your log messages look like a red herring here. You've said that you're mounting over NFS, and I am certain that NFS doesn't support a read timeout. You get the following 'Read count' indicating that the read was successful and no error, so it's all working as intended as far as I can tell, although perhaps too noisy.
Going back to your program, after:
@coherenceMemoryStatus {
total_heap_committed[$node_id][$gc_name] += $heap_committed
total_heap_used[$node_id][$gc_name] += $heap_used
total_heap_max[$node_id][$gc_name] += $heap_max
total_heap_init[$node_id][$gc_name] += $heap_init
}
append these lines
otherwise {
unmatched_lines++
}
for debugging.
I might also add a flag to enable logging of unmatched lines in test mode, but that will be very noisy in production.
I got the following error with the newer version not sure what happened. It says the runtime error:
I0402 08:20:25.573032 9751 main.go:94] mtail version v3.0.0-rc29 git revision b5aa95d10fb3af7b843e3814bc65aedd983425d9 go version go1.11.6 go arch amd64 go os linux
I0402 08:20:25.573058 9751 main.go:95] Commandline: ["/home/gecope/mtail/mtail" "-poll_interval" "1s" "-port" "3902" "-progs" "progs/jmxCacheSize.mtail" "-logs" "/home/gecope/mtail-cron/bsktkvs/cache-size.txt" "-logs" "/home/gecope/mtail-cron/bsktkvsA/cache-size.txt" "-logtostderr" "-v" "2"]
I0402 08:24:59.574501 9751 log_watcher.go:145] Update fi
I0402 08:24:59.574848 9751 file.go:174] Read count 669 err <nil>
I0402 08:24:59.574865 9751 file.go:221] Line sent
I0402 08:24:59.574874 9751 file.go:221] Line sent
I0402 08:24:59.574935 9751 file.go:221] Line sent
I0402 08:24:59.574945 9751 file.go:221] Line sent
I0402 08:24:59.574950 9751 file.go:221] Line sent
I0402 08:24:59.574993 9751 vm.go:90] jmxCacheSize.mtail: Runtime error: panic in thread &vm.thread{pc:24, matched:false, matches:map[int][]string{0:[]string{"/home/gecope/mtail-cron/bsktkvsA/cache-size.txt", "bsktkvsA"}, 1:[]string{"1126\t�� 4 02 17:24:32 JST 2019\tOrderIdDistributedCache\torder-id\t1458924.0\t1458924.0\t1.3913383483886719\t1.0", "OrderIdDistributedCache", "order-id", "1458924.0", "1458924.0", "1.3913383483886719", "1.0"}}, time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}, stack:[]interface {}{true, (*datum.StringDatum)(0xc0002cf740)}} at instr "{cat <nil>}": interface conversion: interface {} is *datum.StringDatum, not string
I0402 08:24:59.575078 9751 vm.go:91] VM stack:
goroutine 42 [running]:
runtime/debug.Stack(0xc00047a190, 0x47, 0xc00047f670)
/usr/local/go/src/runtime/debug/stack.go:24 +0xa7
github.com/google/mtail/internal/vm.(*VM).errorf(0xc00029c280, 0x94512c, 0x23, 0xc00047f670, 0x3, 0x3)
/go/src/github.com/google/mtail/internal/vm/vm.go:91 +0x113
github.com/google/mtail/internal/vm.(*VM).execute.func1(0xc00029c280, 0xc0000c4cd0, 0x26, 0x0, 0x0)
/go/src/github.com/google/mtail/internal/vm/vm.go:296 +0x142
panic(0x8ad3a0, 0xc0002cf7d0)
/usr/local/go/src/runtime/panic.go:513 +0x1b9
github.com/google/mtail/internal/vm.(*VM).execute(0xc00029c280, 0xc0000c4cd0, 0x26, 0x0, 0x0)
/go/src/github.com/google/mtail/internal/vm/vm.go:720 +0x5ded
github.com/google/mtail/internal/vm.(*VM).processLine(0xc00029c280, 0xc00055c120)
/go/src/github.com/google/mtail/internal/vm/vm.go:748 +0x18c
github.com/google/mtail/internal/vm.(*VM).Run(0xc00029c280, 0x6a6d7843, 0xc0001ae420, 0xc0001ae480, 0xc0001ae4e0)
/go/src/github.com/google/mtail/internal/vm/vm.go:767 +0xf5
created by github.com/google/mtail/internal/vm.(*Loader).CompileAndRun
/go/src/github.com/google/mtail/internal/vm/loader.go:241 +0x70d
I0402 08:24:59.575097 9751 vm.go:92] Dumping vm state
I0402 08:24:59.575101 9751 vm.go:93] Name: jmxCacheSize.mtail
I0402 08:24:59.575104 9751 vm.go:94] Input: &logline.LogLine{Filename:"/home/gecope/mtail-cron/bsktkvsA/cache-size.txt", Line:"1126\t�� 4 02 17:24:32 JST 2019\tOrderIdDistributedCache\torder-id\t1458924.0\t1458924.0\t1.3913383483886719\t1.0"}
I0402 08:24:59.575115 9751 vm.go:95] Thread:
I0402 08:24:59.575118 9751 vm.go:96] PC 23
I0402 08:24:59.575121 9751 vm.go:97] Matched false
I0402 08:24:59.575125 9751 vm.go:98] Matches map[0:[/home/gecope/mtail-cron/bsktkvsA/cache-size.txt bsktkvsA] 1:[1126 �� 4 02 17:24:32 JST 2019 OrderIdDistributedCache order-id 1458924.0 1458924.0 1.3913383483886719 1.0 OrderIdDistributedCache order-id 1458924.0 1458924.0 1.3913383483886719 1.0]]
I0402 08:24:59.575135 9751 vm.go:99] Timestamp 0001-01-01 00:00:00 +0000 UTC
I0402 08:24:59.575145 9751 vm.go:100] Stack [true ""@1554193499574966437]
I0402 08:24:59.575277 9751 vm.go:101] Prog: jmxCacheSize.mtail
Metrics
0 Metric: name=total_cache_size program=jmxCacheSize.mtail kind=Counter type=String hidden=false keys=[service cache_name kvs] labelvalues=[LabelValue: [OrderIdDistributedCache order-id bsktkvsA] ""@1554193499574966437] source=jmxCacheSize.mtail:1:9-24 buckets=[]
1 Metric: name=total_memory_bytes program=jmxCacheSize.mtail kind=Counter type=String hidden=false keys=[service cache_name kvs] labelvalues=[] source=jmxCacheSize.mtail:2:9-26 buckets=[]
2 Metric: name=total_memory_mb program=jmxCacheSize.mtail kind=Counter type=String hidden=false keys=[service cache_name kvs] labelvalues=[] source=jmxCacheSize.mtail:3:9-23 buckets=[]
Regexps
0 /[^\t]+/mtail-cron/(?P<kvs>[a-zA-Z]+)[^\t]+/
1 /\d+\t[^\t]+\t(?P<service>[^\t]+)\t(?P<cache_name>[^\t]+)\t(?P<cache_size>\d.+)\t(?P<memory_bytes>\d.+)\t(?P<memory_mb>\d.+)\t(?P<average_object_size>\d.+)/
Strings
disasm l op opnd
0 getfilename 0
1 smatch 0
2 match 1
3 jnm 66
4 setmatched false
5 push 1
6 capref 1
7 push 1
8 capref 2
9 push 0
10 capref 1
11 mload 0
12 dload 3
13 push 1
14 capref 1
15 push 1
16 capref 2
17 push 0
18 capref 1
19 mload 0
20 dload 3
21 push 1
22 capref 3
23 cat <nil>
24 sset <nil>
25 push 1
26 capref 1
27 push 1
28 capref 2
29 push 0
30 capref 1
31 mload 1
32 dload 3
33 push 1
34 capref 1
35 push 1
36 capref 2
37 push 0
38 capref 1
39 mload 1
40 dload 3
41 push 1
42 capref 4
43 cat <nil>
44 sset <nil>
45 push 1
46 capref 1
47 push 1
48 capref 2
49 push 0
50 capref 1
51 mload 2
52 dload 3
53 push 1
54 capref 1
55 push 1
56 capref 2
57 push 0
58 capref 1
59 mload 2
60 dload 3
61 push 1
62 capref 5
63 cat <nil>
64 sset <nil>
65 setmatched true
I0402 08:24:59.575335 9751 file.go:221] Line sent
I0402 08:24:59.575340 9751 file.go:221] Line sent
I0402 08:24:59.575345 9751 file.go:171] /home/gecope/mtail-cron/bsktkvsA/cache-size.txt: file type does not support deadline
I0402 08:24:59.575352 9751 file.go:174] Read count 0 err EOF
Hmm figured it out. Seems I was using counter. But it was double. Now seems ok.
I'm certain there's still a bug but your last update gives me some idea of whta it was.
I think that traceback for the runtime error can be improved greatly because right now you need a good understanding of the mtail VM internals to debug it.
On Wed, 3 Apr 2019 at 15:36, Jakir Patel notifications@github.com wrote:
Hmm figured it out. Seems I was using counter. But it was double. Now seems ok.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/google/mtail/issues/221#issuecomment-479336559, or mute the thread https://github.com/notifications/unsubscribe-auth/AC5b-2S6d9z2s5pfhmiZTN7rbUj67Rrsks5vdC-0gaJpZM4cF6z- .
@jaqx0r
The runtime error should be more clear I think so. For me it took time to debug with the messages.
Is there any recommendation that how the NFS filesystem should be mounted to read the files through mtail ? --@jakirpatel
Log shipping. By introducing NFS, you are introducing a layer of complexity which is known to be complex and problematic for streaming.
Details : Problem : Metrics are not getting exposed.
Mtail version: mtail version v3.0.0-rc16 git revision 77eeba903db0ffd90e29c1ee778fdefb0112bfab go version go1.11
Filesystem: NFS
Logs from the mtail:
Log type:
Mtail program:
Some debug steps I used:
I ran the simple line count program with the same file using --one_shot parameter. It given me the exact result.
Also locally I was able to parse the file and was able to get the result. But with NFS filesystem seems I was not able to get the similar result.