mickem / nscp

NSClient++
http://nsclient.org
GNU General Public License v2.0
239 stars 93 forks source link

0.5.0.17 => Real-time eventlog filters issue? #204

Closed willemdh closed 8 years ago

willemdh commented 8 years ago

Hello,

Just did some tests with the real-time eventlog component of NSClient++ 5.0.9 and noticed that my filters seem never matched. This is the filter i'm using as test:

[/settings/eventlog/real-time]
startup age = 30m
enabled = 1
log = application,system
debug = 1
[/settings/eventlog/real-time/filters]

[/settings/eventlog/real-time/filters/default]
destination = NSCA
maximum age = 2h
ok message = Found no records in eventlog last two hours.
syntax = %type% %id% %source%: %message% 

[/settings/eventlog/real-time/filters/EVT_Application]
log= application
filter= level IN (error) AND (id NOT IN (1,3))
severity= WARNING

When I create an error in the application eventlog with this Powershell command:

write-eventlog -logname Application -source Test -id 6005 -Message test -EntryType Error

the following appears in the nsclient console test modus:

D   eventlog Reading eventlog messages...
D   eventlog Processing: 000000BFE1A56830
D   eventlog No filters matched an event
D   eventlog Next miss time is in: no timeout specified
D   eventlog Sleeping for: 4294967295ms
D   eventlog Reading eventlog messages...
D   eventlog Processing: 000000BFE1A56830
D   eventlog No filters matched an event
D   eventlog Next miss time is in: no timeout specified
D   eventlog Sleeping for: 4294967295ms

I also see this in the nsclient.log:

2015-11-15 22:08:31: debug:c:\source\master\include\parsers/filter/realtime_helper.hpp:148: No filters matched an event
2015-11-15 22:08:31: debug:c:\source\master\include\parsers/filter/realtime_helper.hpp:165: Realtime processing faillure
2015-11-15 22:08:31: error:c:\source\master\include\parsers/filter/realtime_helper.hpp:187: Invalid duration for eventlog check, assuming all values stale
2015-11-15 22:08:31: debug:c:\source\master\include\parsers/filter/realtime_helper.hpp:199: Next miss time is in: 300s

I know for sure this works in 0.4.1.105 so is there something changed in the syntax, is there a bug or am I missing something?

mickem commented 8 years ago

No, syntax should be the same...

I noticed some issues with checklogfile in 0.5.0 which I fixed for the OSMC workshop. But alas the WiFi here wont allow me to push builds, so cant get it out there till I get back I guess...

willemdh commented 8 years ago

No problem, once you draft a new pre-release I'll be happy to test again.

mickem commented 8 years ago

0.5.0.16 is out now: https://www.nsclient.org/download/0.5.0/ reopen if this did not fix your issue...

willemdh commented 8 years ago

Michael, Just tried on 0.5.0.16 and I still can't get any filter to work. It definitely doesn't work the old way I did it:

[/settings/eventlog/real-time/filters/EVT_System]
log= system
filter= level IN (error) AND (id NOT IN (1,3))
severity= WARNING

And I als tried following your filter documentation here http://docs.nsclient.org/0.5.0/reference/windows/CheckEventLog.html?highlight=real-time%20filter#CheckEventLog./settings/eventlog/real-time/filters

with this:

[/settings/eventlog/real-time/filters/EVT_Application]
log= application
filter= level IN (‘error’, ‘warning’)
severity= WARNING

But I keep getting "No filters matched an event"

D   eventlog Reading eventlog messages...
D   eventlog Processing: 00000031C27398C0
D   eventlog No filters matched an event
D   eventlog Next miss time is in: 171s

Making the events with this Powershell command after which they are visible as error in the Application eventlog:

write-eventlog -logname Application -source Test -id 2 -Message test -EntryType Error

Grtz

willemdh commented 8 years ago

Just tested the above on 0.5.0.17 and I'm still not able to make NSClient send an event to nagios over NSCA. keep getting "No filter matched an event" Could you please re-open this issue untill fixed? The real-time eventlog monitoring is one of the most important features of NSClient for me and the reason I choose for NSClient. I cannot upgrade my 600 Windows servers which still run 0.4.1.105 untill this is fixed. :(

TX!

mickem commented 8 years ago

One important snag still is that realtime use the "old" eventlog stuff so some severity and level and such can be wrong. This is next on my TODO list...

willemdh commented 8 years ago

so some severity and level and such can be wrong. This is next on my TODO list...

Fyi, very rarely I get this system eventlog error from a server with 0.5.0.17:

system: 1 (error: The system failed to register host (A or AAAA) resource records (RRs) for network adapter with settings: Adapter Name : {38271ADC-7C44-4D28-AF2E-C6DE1A35CA86} Host Name : SRV2012TEST Primary Domain Suffix : domain DNS server list : 12.16.26.13 Sent update to server : <?> IP Address(es) : 12.10.26.13 The reason the system could not register these RRs was because of a s

So the real-time eventlog monitoring does still kind of works, but the filters that worked in the past do no longer work on 0.4.2.x or later. Looking forward to a fix for this.

mickem commented 8 years ago

Hello,

There are a few issues here.

1, log = system will never match log entries in application. Set log to any or all (or application)

2, It parses the wrong strings. This is a subtle bug which is related to that it supports modern eventlog so it uses the modern eventlog parsers BUT in fact the realtime still use the old. This can be circumvented by using number for instance 1 for error.

3, There is a bug in the filter parser which will fail to convert a number to string in this instance.

The latter will be fixed in the next build. To circumvent this issue you can enter a string from the new which matches the old one your looking for such as "critical".

Thus the following configuration:

[/modules]
CheckEventLog = 1

[/settings/eventlog/real-time]
enabled = true

[/settings/eventlog/real-time/filters/EVT_System]
log= any
filter= level IN ('critical') AND (id NOT IN (1,3))
severity= WARNING

And this:

write-eventlog -logname Application -source Test -id 2 -Message test -EntryType Error

Will lead to this:

D   eventlog Scanning logs: application, system
D   eventlog Next miss time is in: 300s
D   eventlog Sleeping for: 300000ms
D   eventlog Reading eventlog messages...
D   eventlog Processing: 000000CC18A9FE50
E       core No handler for channel: NSCA channels:
                    C:\source\nscp\service\NSClient++.cpp:1242
E   eventlog Failed to submit 'application: 1 (error: test)
                    C:\source\nscp\include\parsers/filter/realtime_helper.hpp:124
D   eventlog Next miss time is in: 300s
D   eventlog Sleeping for: 300000ms

The numeric issue (which was most likely introduced in 0.5.x will be fixed in the next build. The long term fix will be to move the real time over to the new eventlog. And this will probably take a few days, but I will try to fix that ASAP.

// Michael

willemdh commented 8 years ago

Thanks man. :)

mickem commented 8 years ago

Just thought I'd update, that I now have an almost working prototype, so with luck in the next day or so we will have a build with has modern eventlog support for realtime as well...

willemdh commented 8 years ago

I'll be happy to test this for you.

mickem commented 8 years ago

New build will be out in a bit...

mickem commented 8 years ago

New build can be found here: https://github.com/mickem/nscp/releases/tag/0.5.0.23

willemdh commented 8 years ago

Confirmed to install correctly and all test checks on my test server still work. For detailed testing of the fixed real-time monitoring I might need a few days. Is the syntax to configure exclusions supposed to work identically as in 0.4.1.105? Or will I have to make some changes? So for example this:

[/settings/eventlog/real-time/filters/EVT_Application]
log= application
filter= level IN (error) AND (id NOT IN (1,3,10,12,13,23,37119,37124,37225)) AND (id NOT IN (1509) OR source NOT IN ('Userenv')) AND (id NOT IN (1055) OR source NOT IN ('Userenv')) AND (id NOT IN (1030) OR source NOT IN ('Userenv')) AND (id NOT IN (1006) OR source NOT IN ('Userenv'))
severity= WARNING
ok message= Found no records in application eventlog last three days.
maximum age= 3d`

should work to send all errors except those in the exclusion list?

mickem commented 8 years ago

Yes, for most cases... There are some changes in the API (as we now use a newer one) so level for instance has some new levels, and some keywords like strings have changed, and there are a few new options as well.

What you have now will better reflect windows event viewer.

mickem commented 8 years ago

I'll close this for now, but please feel free to reopen if you still have issues...

willemdh commented 8 years ago

Michael, Sorry but I'm not getting it to work for now on 0.5.0.22. I created a nsclient.ini 0.4.3+ compatible nsclient.ini with an exclusion list as I used to use in 0.4.1.105, installed it on a Windows Server 2012 R2 and generated some errors. Tried setting debug to 1 but nothing appears in nsclient.log.

New-Eventlog Application -Source Test
Write-EventLog -LogName Application -Source test -Id 6005 -Message Test -EntryType Error

I compared with a 0.4.1.105 on a Windows Server 2012 R2 where everything works as expected.

This is my nsclient.ini:

[/includes]

[/modules]
CheckDisk = 1
CheckEventLog = 1
CheckExternalScripts = 1
CheckHelpers = 1
CheckLogFile = 1
CheckNSCP = 1
CheckSystem = 1
CheckTaskSched = 0
CheckWMI = 1
CommandClient = 0
DotnetPlugins = 0
GraphiteClient = 0
NRDPClient = 0
NRPEClient = 0
NRPEServer = 1
NSCAClient = 1
NSCAServer = 0
NSClientServer = 1
PythonScript = 0
Scheduler = 0
SimpleCache = 0
SimpleFileWriter = 1
SMTPClient = 0
SyslogClient = 0
WEBServer = 0

[/modules/dotnet]

[/paths]
shared-path = C:\Program Files\NSClient++
exe-path = C:\Program Files\NSClient++
crash-folder = C:\Program Files\NSClient++
certificate-path = ${shared-path}/security
base-path = C:\Program Files\NSClient++
module-path = ${shared-path}/modules

[/settings/cache]
channel = CACHE
primary index = ${alias-or-command}

[/settings/crash]
archive = true
archive folder = ${shared-path}/crash-dumps
restart = true
restart target = NSClientpp
submit = false
submit url = http://crash.nsclient.org/post

[/settings/default]
allowed hosts = 127.0.0.1, 10.10.10.10
cache allowed hosts = 1
inbox = inbox
password = password
timeout = 20

[/settings/eventlog]
buffer size = 131072
debug = 0
lookup names = 1

[/settings/eventlog/real-time]
debug = 1
enabled = true
log = application,system
startup age = 30m

[/settings/eventlog/real-time/filters/default]
destination=NSCA
maximum age= 3d
ok message= eventlog found no records test default
syntax=%type% %id% %source%: %message% 

[/settings/eventlog/real-time/filters/EVT_Application]
log= application
filter= level IN (error) AND (id NOT IN (10,12,13,23,26,33,37,38,58,67,101,103,104,107,108,110,112,274,502,511,1000,1002,1004,1005,1009,1010,1026,1027,1053,1054,1085,1101,1107,1116,1301,1325,1334,1373,1500,1502,1504,1508,1511,1515,1521,1533,1542,2019,2158,2636,2670,3001,3008,3012,3021,3032,3037,3042,3077,3079,3098,3119,3130,3131,3148,3159,4005,4102,4237,4621,5008,5009,5051,5124,5133,5605,5705,6032,6100,7043,7363,7735,7823,7827,7833,8193,8194,8196,8313,9001,10000,10005,10007,10862,10922,11317,12121,12289,12298,12321,13793,13836,14197,14204,15000,16038,16041,16053,16063,16066,16068,16195,16391,16418,16419,16421,17187,17192,17204,17412,17898,18176,19269,19458,19954,19969,19972,20958,21061,22670,35698,35705,35710,35712,35716,35721,35726,37088,37090,37092,37095,37098,37119,37124,37225)) AND (id NOT IN (1509) OR source NOT IN ('Userenv')) AND (id NOT IN (1055) OR source NOT IN ('Userenv')) AND (id NOT IN (1030) OR source NOT IN ('Userenv')) AND (id NOT IN (1006) OR source NOT IN ('Userenv')) AND (id NOT IN (16385) OR source NOT IN ('Software Protection Platform Service')) AND (id NOT IN (513) OR source NOT IN ('CAPI2')) AND (id NOT IN (1008) OR source NOT IN ('Perflib')) AND (id NOT IN (215) OR source NOT IN ('ESENT')) AND (id NOT IN (513) OR source NOT IN ('Microsoft-Windows-CAPI2')) AND (id NOT IN (2005) OR source NOT IN ('PerfNet'))
severity= WARNING
ok message= Autoreset, found no records in application eventlog
maximum age= 3d

[/settings/eventlog/real-time/filters/EVT_System]
log= system
filter= level IN (error) AND (id NOT IN (1,3,4,5,8,9,10,11,15,19,27,37,39,50,54,56,137,1030,1041,1069,1071,1111,1196,3621,4192,4224,4243,4307,5722,6161,7000,7001,7009,7011,7016,7022,7023,7024,7026,7032,8003,9022,10005,10006,10009,10010,10016)) AND (id NOT IN (7043) OR source NOT IN ('Service Control Manager')) AND (id NOT IN (36888) OR source NOT IN ('Schannel')) AND (id NOT IN (36887) OR source NOT IN ('Schannel')) AND (id NOT IN (36874) OR source NOT IN ('Schannel')) AND (id NOT IN (36870) OR source NOT IN ('Schannel')) AND (id NOT IN (12292) OR source NOT IN ('VSS')) AND (id NOT IN (7034) OR source NOT IN ('Service Control Manager')) AND (id NOT IN (12) OR source NOT IN ('PlugPlayManager')) AND (id NOT IN (1006) OR source NOT IN ('Microsoft-Windows-GroupPolicy')) AND (id NOT IN (20) OR source NOT IN ('Microsoft-Windows-WindowsUpdateClient'))
severity= WARNING
ok message= Autoreset, found no records in system eventlog
maximum age= 3d

# naf_windows_nsclient_config detected this server is not a clusternode.

[/settings/external scripts]
allow arguments = true
allow nasty characters = true
timeout = 300

[/settings/external scripts/alias]
alias_sched_all = check_tasksched show-all "syntax=${title}: ${exit_code}" "crit=exit_code ne 0"
alias_file_size = check_files "path=$ARG1$" "crit=size > $ARG2$" "top-syntax=${list}" "detail-syntax=${filename] ${size}" max-dir-depth=10
alias_check_file_size_nsclientlog = check_files "path=$ARG1$" "crit=size > $ARG2$" "top-syntax=${list}" "detail-syntax=${filename] ${size}" max-dir-depth=10
alias_process_hung = check_process "filter=is_hung" "crit=count>0"
alias_process = check_process "process=$ARG1$" "crit=state != 'started'"
alias_service_ex = check_service "exclude=Net Driver HPZ12" "exclude=Pml Driver HPZ12" exclude=stisvc
alias_event_log = check_eventlog
alias_volumes_loose = check_drivesize
alias_volumes = check_drivesize
alias_disk = check_drivesize
alias_up = check_uptime
alias_cpu_ex = check_cpu "warn=load > $ARG1$" "crit=load > $ARG2$" time=5m time=1m time=30s
alias_file_age = check_files "path=$ARG1$" "crit=written > $ARG2$" "top-syntax=${list}" "detail-syntax=${filename] ${written}" max-dir-depth=10
alias_process_stopped = check_process "process=$ARG1$" "crit=state != 'stopped'"
alias_service = check_service
alias_cpu = check_cpu
alias_mem = check_memory
alias_process_count = check_process "process=$ARG1$" "warn=count > $ARG2$" "crit=count > $ARG3$"
alias_sched_task = check_tasksched show-all "filter=title eq '$ARG1$'" "detail-syntax=${title} (${exit_code})" "crit=exit_code ne 0"
alias_disk_loose = check_drivesize
alias_sched_long = check_tasksched "filter=status = 'running'" "detail-syntax=${title} (${most_recent_run_time})" "crit=most_recent_run_time < -$ARG1$"

[/settings/external scripts/scripts]
naf_delete_nscp_log_file=cmd /c echo scripts\powershell\naf_delete_nscp_log_file.ps1 $ARG1$; exit $LastExitCode | powershell.exe -command -
naf_query_ms_win_server = cmd /c echo scripts\powershell\naf_query_ms_win_server.ps1 $ARG1$; exit $LastExitCode | powershell.exe /noprofile -command -
naf_shutdown_windows_server=cmd /c echo scripts\powershell\naf_shutdown_windows_server.ps1 $ARG1$; exit $LastExitCode | powershell.exe -command -
naf_restart_windows_server=cmd /c echo scripts\powershell\naf_restart_windows_server.ps1 $ARG1$; exit $LastExitCode | powershell.exe -command -

[/settings/external scripts/wrapped scripts]
check_ms_ad_accounts                        = check_ms_ad_accounts.ps1
check_ms_cluster_preferred_node             = check_ms_cluster_preferred_node.ps1
check_ms_ctx_loadevaluator                  = check_ms_ctx_loadevaluator.ps1
check_ms_exchange_2010_health               = check_ms_exchange_2010_health.ps1
check_ms_exchange_2010_hybrid               = check_ms_exchange_2010_hybrid.ps1
check_ms_exchange_2010_replication          = check_ms_exchange_2010_replication.ps1
check_ms_sharepoint_2010_connections        = check_ms_sharepoint_2010_connections.ps1
check_ms_sharepoint_2010_sitecollections    = check_ms_sharepoint_2010_sitecollections.ps1
check_ms_sharepoint_health                  = check_ms_sharepoint_health.ps1
check_ms_win_certificates                   = check_ms_win_certificates.ps1
check_ms_win_disk_load                      = check_ms_win_disk_load.ps1
check_ms_win_network_connections            = check_ms_win_network_connections.ps1
check_ms_win_network_load                   = check_ms_win_network_load.ps1
check_ms_win_tasks                          = check_ms_win_tasks.ps1
check_ms_win_updates                        = check_ms_win_updates.ps1

naf_citrix_drain_server                     = naf_citrix_drain_server.ps1
naf_vmware_initiate_snapshot                = naf_vmware_initiate_snapshot.ps1
naf_windows_information                     = naf_windows_information.ps1
naf_windows_initiate_audit                  = naf_windows_initiate_audit.ps1
naf_windows_initiate_wsus_updates           = naf_windows_initiate_wsus_updates.ps1
naf_windows_robocopy                        = naf_windows_robocopy.ps1
naf_windows_service                         = naf_windows_service.ps1

[/settings/external scripts/wrappings]
bat = scripts\\%SCRIPT% %ARGS%
vbs = cscript.exe //T:30 //NoLogo scripts\\lib\\wrapper.vbs %SCRIPT% %ARGS%
ps1 = cmd /c echo If (-Not (Test-Path "scripts\powershell\%SCRIPT%") ) { Write-Host "UNKNOWN: Script `"%SCRIPT%`" not found."; exit(3) }; scripts\powershell\%SCRIPT% $ARGS$; exit($lastexitcode) | powershell.exe /noprofile -command -

[/settings/graphite/client]
channel = GRAPHITE
hostname = auto

[/settings/graphite/client/targets/default]
path = system.${hostname}.${check_alias}.${perf_alias}

[/settings/log]
date format = %Y-%m-%d %H:%M:%S
file name = ${exe-path}/nsclient.log
level = info

[/settings/log/file]
max size = 0

[/settings/logfile]

[/settings/logfile/real-time]
enabled = 0

[/settings/logfile/real-time/checks]

[/settings/NSCA/client]
channel = NSCA
hostname = srv2012test

[/settings/NSCA/client/targets/default]
address = 10.10.10.10
allowed ciphers = ADH
certificate = 
encryption = none
password = password
timeout = 30
use ssl = 0
verify mode = none

[/settings/NSCA/server]
port = 5667
performance data = 1
use ssl = 0
encryption = aes
payload length = 512

[/settings/NSClient/server]
performance data = true
port = 12489
use ssl = 0

[/settings/NRDP/client]
channel = NRDP
hostname = auto

[/settings/NRDP/client/targets/default]
sender = nscp@localhost
recipient = nscp@localhost
timeout = 30
template = Hello, this is %source% reporting %message%!

[/settings/NRPE/client]
channel = NRPE

[/settings/NRPE/client/targets/default]
timeout = 30
verify mode = none
payload length = 1024
use ssl = 1

[/settings/NRPE/server]
allow arguments = true
allow nasty characters = true
extended response = false
insecure = true 
port = 5666
ssl options = no-sslv2,no-sslv3
timeout = 120
verify mode = none
use ssl = 1

[/settings/python]

[/settings/python/scripts]

[/settings/scheduler]
threads = 5

[/settings/scheduler/schedules]

[/settings/shared session]
enabled = 0

[/settings/SMTP/client]
channel = SMTP

[/settings/SMTP/client/targets/default]
sender = nscp@localhost
template = Hello, this is %source% reporting %message%!
timeout = 30
recipient = nscp@localhost

[/settings/syslog/client]
channel = syslog

[/settings/syslog/client/targets/default]
warning severity = warning
tag_syntax = NSCA
severity = error
ok severity = informational
message_syntax = %message%
facility = kernel
critical severity = critical
unknown severity = emergency

[/settings/system/windows]
default buffer length = 1h

[/settings/system/windows/counters]

[/settings/system/windows/service mapping]

[/settings/system/windows/real-time]

[/settings/system/windows/real-time/checks]

[/settings/targets]

[/settings/WEB/server]
port = 8443s
certificate = ${certificate-path}/certificate.pem

[/settings/writers/file]
syntax = ${alias-or-command} ${result} ${message}
file = output.txt
channel = FILE

So why is my test error 6005 not catched byt the real-time event log module and logged to nsclient.log as I have set debug =1?

Pls let me know if you can see any mistakes on my side or where I'm missing the ball. By the way I can't re-open this as I'm not a collaborator. I can only re-open if I closed my own issue.

mickem commented 8 years ago

This slightly shorter example (which should be the relevant issues from your file) works for me:

[/modules]
CheckEventLog = 1

[/settings/eventlog/real-time]
enabled = true

[/settings/eventlog/real-time/filters/default]
destination=NSCA
maximum age= 3d
ok message= eventlog found no records test default
syntax=%type% %id% %source%: %message% 

[/settings/eventlog/real-time/filters/EVT_Application]
log= application
filter= level IN (error) AND (id NOT IN (10,12,13,23,26,33,37,38,58,67,101,103,104,107,108,110,112,274,502,511,1000,1002,1004,1005,1009,1010,1026,1027,1053,1054,1085,1101,1107,1116,1301,1325,1334,1373,1500,1502,1504,1508,1511,1515,1521,1533,1542,2019,2158,2636,2670,3001,3008,3012,3021,3032,3037,3042,3077,3079,3098,3119,3130,3131,3148,3159,4005,4102,4237,4621,5008,5009,5051,5124,5133,5605,5705,6032,6100,7043,7363,7735,7823,7827,7833,8193,8194,8196,8313,9001,10000,10005,10007,10862,10922,11317,12121,12289,12298,12321,13793,13836,14197,14204,15000,16038,16041,16053,16063,16066,16068,16195,16391,16418,16419,16421,17187,17192,17204,17412,17898,18176,19269,19458,19954,19969,19972,20958,21061,22670,35698,35705,35710,35712,35716,35721,35726,37088,37090,37092,37095,37098,37119,37124,37225)) AND (id NOT IN (1509) OR source NOT IN ('Userenv')) AND (id NOT IN (1055) OR source NOT IN ('Userenv')) AND (id NOT IN (1030) OR source NOT IN ('Userenv')) AND (id NOT IN (1006) OR source NOT IN ('Userenv')) AND (id NOT IN (16385) OR source NOT IN ('Software Protection Platform Service')) AND (id NOT IN (513) OR source NOT IN ('CAPI2')) AND (id NOT IN (1008) OR source NOT IN ('Perflib')) AND (id NOT IN (215) OR source NOT IN ('ESENT')) AND (id NOT IN (513) OR source NOT IN ('Microsoft-Windows-CAPI2')) AND (id NOT IN (2005) OR source NOT IN ('PerfNet'))
severity= WARNING
ok message= Autoreset, found no records in application eventlog
maximum age= 3d

[/settings/eventlog/real-time/filters/EVT_System]
log= system
filter= level IN (error) AND (id NOT IN (1,3,4,5,8,9,10,11,15,19,27,37,39,50,54,56,137,1030,1041,1069,1071,1111,1196,3621,4192,4224,4243,4307,5722,6161,7000,7001,7009,7011,7016,7022,7023,7024,7026,7032,8003,9022,10005,10006,10009,10010,10016)) AND (id NOT IN (7043) OR source NOT IN ('Service Control Manager')) AND (id NOT IN (36888) OR source NOT IN ('Schannel')) AND (id NOT IN (36887) OR source NOT IN ('Schannel')) AND (id NOT IN (36874) OR source NOT IN ('Schannel')) AND (id NOT IN (36870) OR source NOT IN ('Schannel')) AND (id NOT IN (12292) OR source NOT IN ('VSS')) AND (id NOT IN (7034) OR source NOT IN ('Service Control Manager')) AND (id NOT IN (12) OR source NOT IN ('PlugPlayManager')) AND (id NOT IN (1006) OR source NOT IN ('Microsoft-Windows-GroupPolicy')) AND (id NOT IN (20) OR source NOT IN ('Microsoft-Windows-WindowsUpdateClient'))
severity= WARNING
ok message= Autoreset, found no records in system eventlog
maximum age= 3d
Write-EventLog -LogName Application -Source test -Id 6005 -Message Test -EntryType Error`

Gives me:

$ nscp test
...
D   eventlog Scanning logs: application, system
D   eventlog Next miss time is in: 259200s
D   eventlog Sleeping for: 259200000ms
D   eventlog Detected action on: application
D   eventlog Next miss time is in: 259200s
D   eventlog Sleeping for: 259200000ms
D   eventlog Detected action on: system
D   eventlog Next miss time is in: 259200s
D   eventlog Sleeping for: 259200000ms
D   eventlog Detected action on: application
D   eventlog Failed to format eventlog record: ID=6005: 6: Referensen (handle) är felaktig.

E       core No handler for channel: NSCA channels:
                    C:\source\nscp\service\NSClient++.cpp:1242
E   eventlog Failed to submit 'Application: 1 (error: )
                    C:\source\nscp\include\parsers/filter/realtime_helper.hpp:124
D   eventlog Next miss time is in: 259177s
D   eventlog Sleeping for: 259177000ms

Now as there is no real message it cannot render the message properly thus we get "Application: 1 (error: )" instead of the error message, but we do get the alert.

mickem commented 8 years ago

As a references:

Write-EventLog -LogName Application -Source test -Id 10 -Message Test -EntryType Error

gives me:

D   eventlog Next miss time is in: 259177s
D   eventlog Sleeping for: 259177000ms
D   eventlog Detected action on: application
D   eventlog Error: Failed to evaluate message: EvtFormatMessage failed: 6: Referensen (handle) är felaktig.

D   eventlog No filters matched: application:10=error
D   eventlog Next miss time is in: 258640s
D   eventlog Sleeping for: 258640000ms

WHich is one of the IDs you exclude, so then it will fall asleap again as it does not match the filter...

willemdh commented 8 years ago

I have been testing this at my home pc and was able to get it working somehow. I'll review my config at work one of the coming days.

So at home this:

Write-EventLog -LogName Application -Source test -Id 9 -Message "This a test with a big message" -EntryType Error

resulted in me getting:

Application: 1 (error: )

In my EVT_Application service with the above config.

So,it seems like there might be something wrong with my syntax?

syntax = %type% %id% %source%: %message% 

As I did not get the message content into my service.

Tx and grtz

willemdh commented 8 years ago

Tried commenting

#syntax = %type% %id% %source%: %message% 

Hoping that I would get the default settings which would include the message. But still nothing.

Read through http://docs.nsclient.org/reference/CheckEventLog.html#CheckEventLog./settings/eventlog.syntax

Then I tried:

[/settings/eventlog/real-time/filters/default]
destination = NSCA
maximum age = 2h
ok message = Found no records in eventlog last two hours.
syntax = %(level) %(id) %(source): %(message)

But my service also didn't receive the message content. So then tried:

[/settings/eventlog/real-time/filters/default]
destination = NSCA
maximum age = 2h
ok message = Found no records in eventlog last two hours.
syntax = ${level} ${id} ${source}: ${message}

But still no message... Any tip to get me going? In my humble opinion, the default should contain the message no?

mickem commented 8 years ago

The problem is that the "fake messages" your sending in does not have a real message, thats why you do not get a message.

My guess is in thew log you have:

D   eventlog Error: Failed to evaluate message: EvtFormatMessage failed: 6: Referensen (handle) är felaktig.

Which essentially says "failed to render message"...

mickem commented 8 years ago

As a reference real eventlog messages are not string they are messages identified by ID:s and read from a DLL and then data is added to it...

willemdh commented 8 years ago

I'm a bit confused as I just checked the eventlog I created though and the message seems to set to what I passed with

Write-EventLog -LogName Application -Source test -Id 9 -Message "This a test with a big message" -EntryType Error

image

image

image

I'm not finding any render errors. So any tip on a command to generate such a real error event? Which one of the three examples I gave in my previous post should work?

mickem commented 8 years ago

Well, simplest way is to use an existing message instead of one you create yourself...

If you look at a "normal" message it looks like this:

image

But the event data only contains the "strings" which are replaced in the message.

image

The message is then loaded from the event source and render with the replacement strings...

So reuse a real message and put-in the regular strings and it should work fine...

// Michael Medin

willemdh commented 8 years ago

I attempted to create an existing eventlog with for example (tried many):

Write-EventLog -LogName Application -Source Perflib -Id 1008 -Message 'The Open Procedure for service "BITS" in DLL "C:\Windows\System32\bitsperf.dll" failed. Performance data for this service will not be available. The first four bytes (DWORD) of the Data section contains the error code.' -EntryType Error -Category 0

But I'm still getting:

image

But somehow it is still not being treated as a 'normal' eventlog message. I still think this is weird. I know several custom written application that write to the eventlog with similar statements. I will test more the coming days, but I thought this worked in 0.4.1.105. Going to sleep now. To be continued.. :)

willemdh commented 8 years ago

Michael,

I have done new tests and compared with a 0.4.1.105 system.

PS C:\Windows\system32> Write-EventLog -LogName Application -Source test -Id 6005 -Message Test -EntryType Error

Results in:

image

The same command on a 0.5.0.23 results in:

image

It should be possible to test this functionality with the above ps command as it was always possible. imho.

Tried with:

syntax=${level} ${id} ${source}: ${message}

syntax = %(level) %(id) %(source): %(message)

syntax=%type% %id% %source%: %message% 

Tested on Windows 10 at home => No message output WIndows 2012 R2 at work => No message output

Tired using 'existing source and id' but also no success in outputting the message etc, as configured in the syntax directive in nsclient.ini

We also found a way to generate a true error with the help of Symantec SSR. In the past we were always alerted when an SSR backup failed. SO I start the backup with a faulty configuration, the backup failes and generates an error.

image

This error still arrives in Nagios as

image

Tried this several time with different syntax'es. Could you please review your opinion on this or propose a working syntax to me? Tx

mickem commented 8 years ago

Ill dig up a real example and see if I can replicate it all the way... Could be the syntax strings which are wrong... or a bug...

mickem commented 8 years ago

So sorry, I messed up a handle so it was closed to early. Will push a new build, but now the following works:

[/modules]
CheckEventLog = 1
CheckSystem = 1
SimpleFileWriter = 1

[/settings/eventlog/real-time]
enabled = true

[/settings/writers/file]
file=c:\\test\\test.txt

[/settings/eventlog/real-time/filters/default]
destination=FILE

[/settings/eventlog/real-time/filters/test 1]
log=System
filter=level IN (error) AND id = 158
severity= WARNING
top syntax=test 1: %(message)

[/settings/eventlog/real-time/filters/test 2]
log=System
filter=level IN (error) AND id = 158
severity= WARNING
top syntax=test 2: %(message)

And then pushing messages:

write-eventlog -logname System -source Microsoft-Windows-Time-Service -id 158 -Message test -EntryType Error

And the resulting output file:

test 2 WARNING test 2: The time provider 'test' has indicated that the current hardware and operating environment is not supported and has stopped. This behavior is expected for VMICTimeProvider on non-HyperV-guest environments. This may be the expected behavior for the current provider in the current operating environment as well.
test 1 WARNING test 1: The time provider 'test' has indicated that the current hardware and operating environment is not supported and has stopped. This behavior is expected for VMICTimeProvider on non-HyperV-guest environments. This may be the expected behavior for the current provider in the current operating environment as well.
test 2 WARNING test 2: The time provider 'test' has indicated that the current hardware and operating environment is not supported and has stopped. This behavior is expected for VMICTimeProvider on non-HyperV-guest environments. This may be the expected behavior for the current provider in the current operating environment as well.
test 1 WARNING test 1: The time provider 'test' has indicated that the current hardware and operating environment is not supported and has stopped. This behavior is expected for VMICTimeProvider on non-HyperV-guest environments. This may be the expected behavior for the current provider in the current operating environment as well.
willemdh commented 8 years ago

No problem. I know how fast bugs get in. Tx fr confirming Im not crazy... I'll test tomorrow.

mickem commented 8 years ago

https://github.com/mickem/nscp/releases/tag/0.5.0.25

willemdh commented 8 years ago

Michael,

Good news => 0.5.0.25 top syntax works nicely.

[/settings/eventlog/real-time/filters/default]
destination=NSCA
maximum age= 3d
ok message= eventlog found no records test default
top syntax= ${level} ${id} ${source}: ${message}

The above outputs just like it did before. :)

But i found that the filters do not work as expected. So or something has changed in the syntax I don't know about, or there is some bug with the filter syntax. Or something has changed with the source.

Take for example this filter:

filter= level IN (error) AND (id NOT IN (10,11)) AND (id NOT IN (2005) OR source NOT IN ('PerfNet'))

When I create an event

Write-EventLog -LogName Application -Source "PerfNet" -Id 2005 -Message "This a test with a big message" -EntryType Error

The event is not filtered... While in 0.4.1.105 this was in fact fitlered. After doing some troubleshooting, it seems that NSClient is looking at the Name instead of the eventsourcename:

image

This is really messed up, as it kind of make my current exclusion list worthless.. :( I'm not sure how to handle this. I can see not all events have an EventSourceName, it seems like only WIndows realted events have this item. For example also:

image

Would there be no way to make a source I can filter on which always takes the 'source which is seen in the general tab of an event?

image

or would there be a way to make NSclient check if 'EventSourceName' exists and if it does use that as source instead of 'Name'. I can imagine I'm asking for a lot, but as I have a list of about 5283 exclusions atm, my head starts spinning if I somehow will have to convert all sources to this new format. If there was some way to make nsclient see the source like it used to be, this would probably save me days of work. As users tend to make exclusions based on the source they see in the general tab, wouldn't it be logical to give us the option to use exactly this name in a real-time nsclient filter instead of the modern eventlog name MS has invented, but is not showing in the general eventlog detail tab?

mickem commented 8 years ago

My guess would be source... since we are using a new api we (most likely)have the long source name i.e. Microsoft-Windows-PerfNet but that is a guess...

willemdh commented 8 years ago

But you don't see a way to somehow be able to use the short source names in a filter? I mean that it checks with the short source name (not with the long source name resulting in non compliance of the filter)

mickem commented 8 years ago

Well, currently it is just a guess... I can look into it...but first need to check if thats the case...

willemdh commented 8 years ago

Ok.. Thanks a lot. Otherwise I'll have to make another column in my sql db which would keep the long source name and start from scratch or find some way to convert the short source names to long source names.

mickem commented 8 years ago

Ill look into it...

willemdh commented 8 years ago

Michael, I've also nvestigated further. Seems I was wrong, I see the same results in 0.4.1.105. The problem is I was using a 2003 Server exclusion id - source combination. I think I'm all set to continue test and start migrating some NSClient versions. Sorry!! I'll close this issue for now.

mickem commented 8 years ago

Great, I did som looking and while the "old short name" is in the registry there is no way to access it from the API (that I have found). And I am reluctant to scrape the registry for information so I think I will leave it as is for now...