mickem / nscp

NSClient++
http://nsclient.org
GNU General Public License v2.0
240 stars 94 forks source link

Ustoppable 0.5.0.62 nscp service on Dutch Windows 2008 R2 servers #396

Closed willemdh closed 7 years ago

willemdh commented 7 years ago

In version 0.5.0.62 of NSClient++ it's not possible to stop the service on Dutch Windows 2008 R2 servers. Not with services.msc, also not with Powershell command:

$Service = Get-Service -Name nscp
Stop-Service -WarningAction SilentlyContinue -InputObject $Service

image

The service ends up in a 'Stopping' state untill the nscp process is killed.

image

In version 0.4.1.105 this worked without issues. I'm not seeing this issue on English Windows servers.

mickem commented 7 years ago

Interesting.. I have never seen this myself... Is it just dutch or do you have the same issue in other locales?

Could you enable debug log (and/or run in test mode) and attach it here?

willemdh commented 7 years ago

Not sure if I'm doing this right, but after entering nscp test and restarting the service, I see this in sclient.log:

2017-03-20 14:19:16: error:c:\source\master\service\NSClient++.cpp:935: Failed to process command: Library is not loaded
2017-03-20 14:19:16: error:c:\source\master\include\nscapi\nscapi_core_wrapper.cpp:191: Failed to execute query
2017-03-20 14:20:01: error:c:\source\master\service\NSClient++.cpp:935: Failed to process command: Library is not loaded
2017-03-20 14:20:01: error:c:\source\master\include\nscapi\nscapi_core_wrapper.cpp:191: Failed to execute query
2017-03-20 14:20:13: error:c:\source\master\service\NSClient++.cpp:935: Failed to process command: Library is not loaded
2017-03-20 14:20:13: error:c:\source\master\include\nscapi\nscapi_core_wrapper.cpp:191: Failed to execute query

it seems I do not have this issue with the default generated nsclient.ini. So the issue might only exist in one module. I'll try to investigate further. As a reference, this is my nsclient.ini:

[/includes]

[/modules]
CheckDisk = 1
CheckEventLog = 1
CheckExternalScripts = 1
CheckHelpers = 1
CheckLogFile = 1
CheckNSCP = 1
CheckSystem = 1
CheckTaskSched = 0
CheckWMI = 1
CommandClient = 0
DotnetPlugins = 0
GraphiteClient = 0
NRDPClient = 0
NRPEClient = 0
NRPEServer = 1
NSCAClient = 1
NSCAServer = 0
NSClientServer = 1
PythonScript = 0
Scheduler = 0
SimpleCache = 0
SimpleFileWriter = 1
SMTPClient = 0
SyslogClient = 0
WEBServer = 0

[/modules/dotnet]

[/paths]
shared-path = C:\Program Files\NSClient++
exe-path = C:\Program Files\NSClient++
crash-folder = C:\Program Files\NSClient++
certificate-path = ${shared-path}/security
base-path = C:\Program Files\NSClient++
module-path = ${shared-path}/modules

[/settings/cache]
channel = CACHE
primary index = ${alias-or-command}

[/settings/crash]
archive = true
archive folder = ${shared-path}/crash-dumps
restart = true
restart target = NSClientpp
submit = false
submit url = http://crash.nsclient.org/post

[/settings/default]
allowed hosts = <list of allowed ip addresses>
cache allowed hosts = 1
inbox = inbox
password = pass
timeout = 20

[/settings/eventlog]
buffer size = 131072
debug = 1
lookup names = 1

[/settings/eventlog/real-time]
debug = 0
enabled = true
log = application,system
startup age = 30m

[/settings/eventlog/real-time/filters/default]
destination=NSCA
maximum age= 3d
ok message= eventlog found no records test default
syntax=%type% %id% %source%: %message% 

[/settings/eventlog/real-time/filters/EVT_Application]
log= application
filter= level IN (error) AND (id NOT IN (10,12,13,23,26,33,37,38,58,67,101,103,104,107,108,110,112,274,502,511,1000,1002,1004,1005,1009,1010,1026,1027,1053,1054,1085,1101,1107,1116,1301,1325,1334,1373,1500,1502,1504,1508,1511,1515,1521,1533,1542,2019,2158,2636,2670,3001,3008,3012,3021,3032,3037,3042,3077,3079,3098,3119,3130,3131,3148,3159,4005,4102,4237,4621,5008,5009,5051,5124,5133,5605,5705,6032,6100,7043,7363,7735,7823,7827,7833,8193,8194,8196,8313,9001,10000,10005,10007,10862,10922,11317,12121,12289,12298,12321,13793,13836,14197,14204,15000,16038,16041,16053,16063,16066,16068,16195,16391,16418,16419,16421,17187,17192,17204,17412,17898,18176,19269,19458,19954,19969,19972,20958,21061,22670,35698,35705,35710,35712,35716,35721,35726,37088,37090,37092,37095,37098,37119,37124,37225)) AND (id NOT IN (5) OR source NOT IN ('Microsoft-Windows-CAPI2')) AND (id NOT IN (4101) OR source NOT IN ('Microsoft-Windows-CAPI2')) AND (id NOT IN (1023) OR source NOT IN ('Perflib')) AND (id NOT IN (6096) OR source NOT IN ('Application Virtualization Server')) AND (id NOT IN (2004) OR source NOT IN ('PerfNet')) AND (id NOT IN (20) OR source NOT IN ('Therefore')) AND (id NOT IN (1008) OR source NOT IN ('Microsoft-Windows-Perflib')) AND (id NOT IN (3038) OR source NOT IN ('Application Virtualization Server')) AND (id NOT IN (80) OR source NOT IN ('Application Virtualization Server')) AND (id NOT IN (20) OR source NOT IN ('Application Virtualization Server')) AND (id NOT IN (0) OR source NOT IN ('OWSClient')) AND (id NOT IN (6007) OR source NOT IN ('Application Virtualization Client')) AND (id NOT IN (6016) OR source NOT IN ('Application Virtualization Client')) AND (id NOT IN (6544) OR source NOT IN ('Goverlan')) AND (id NOT IN (1509) OR source NOT IN ('Userenv')) AND (id NOT IN (1055) OR source NOT IN ('Userenv')) AND (id NOT IN (1030) OR source NOT IN ('Userenv')) AND (id NOT IN (1006) OR source NOT IN ('Userenv')) AND (id NOT IN (16385) OR source NOT IN ('Software Protection Platform Service')) AND (id NOT IN (41472) OR source NOT IN ('Application Virtualization Server')) AND (id NOT IN (29488) OR source NOT IN ('Rimses Application')) AND (id NOT IN (29488) OR source NOT IN ('ReMax Application')) AND (id NOT IN (10006) OR source NOT IN ('Microsoft-Windows-RestartManager')) AND (id NOT IN (6034) OR source NOT IN ('Application Virtualization Client')) AND (id NOT IN (6025) OR source NOT IN ('Application Virtualization Client')) AND (id NOT IN (4096) OR source NOT IN ('VSTO 4.0')) AND (id NOT IN (2001) OR source NOT IN ('Microsoft Office 14')) AND (id NOT IN (2000) OR source NOT IN ('Microsoft Office 14')) AND (id NOT IN (1106) OR source NOT IN ('MetaFrameEvents')) AND (id NOT IN (1009) OR source NOT IN ('picadm')) AND (id NOT IN (27) OR source NOT IN ('Outlook')) AND (id NOT IN (4) OR source NOT IN ('LoginScript')) AND (id NOT IN (4) OR source NOT IN ('LoginScript')) AND (id NOT IN (4) OR source NOT IN ('WSH')) AND (id NOT IN (2) OR source NOT IN ('LoginScript')) AND (id NOT IN (6044) OR source NOT IN ('Application Virtualization Client')) AND (id NOT IN (1526) OR source NOT IN ('Microsoft-Windows-User Profiles Service')) AND (id NOT IN (513) OR source NOT IN ('CAPI2')) AND (id NOT IN (1008) OR source NOT IN ('Perflib')) AND (id NOT IN (215) OR source NOT IN ('ESENT')) AND (id NOT IN (1) OR source NOT IN ('CanonPrinterDriver3')) AND (id NOT IN (513) OR source NOT IN ('Microsoft-Windows-CAPI2')) AND (id NOT IN (59) OR source NOT IN ('SideBySide')) AND (id NOT IN (2005) OR source NOT IN ('PerfNet'))
severity= WARNING
ok message= Autoreset, found no records in application eventlog
maximum age= 3d

[/settings/eventlog/real-time/filters/EVT_System]
log= system
filter= level IN (error) AND (id NOT IN (1,3,4,5,8,9,10,11,15,19,27,37,39,50,54,56,137,1030,1041,1069,1071,1111,1196,3621,4192,4224,4243,4307,5722,6161,7000,7001,7009,7011,7016,7022,7023,7024,7026,7032,8003,9022,10005,10006,10009,10010,10016)) AND (id NOT IN (36882) OR source NOT IN ('Schannel')) AND (id NOT IN (7031) OR source NOT IN ('Service Control Manager')) AND (id NOT IN (7043) OR source NOT IN ('Service Control Manager')) AND (id NOT IN (36888) OR source NOT IN ('Schannel')) AND (id NOT IN (36887) OR source NOT IN ('Schannel')) AND (id NOT IN (36874) OR source NOT IN ('Schannel')) AND (id NOT IN (36870) OR source NOT IN ('Schannel')) AND (id NOT IN (12292) OR source NOT IN ('VSS')) AND (id NOT IN (7034) OR source NOT IN ('Service Control Manager')) AND (id NOT IN (12) OR source NOT IN ('PlugPlayManager')) AND (id NOT IN (1009) OR source NOT IN ('picadm')) AND (id NOT IN (36871) OR source NOT IN ('Schannel')) AND (id NOT IN (36882) OR source NOT IN ('Schannel')) AND (id NOT IN (1006) OR source NOT IN ('Microsoft-Windows-GroupPolicy')) AND (id NOT IN (20) OR source NOT IN ('Microsoft-Windows-WindowsUpdateClient'))
severity= WARNING
ok message= Autoreset, found no records in system eventlog
maximum age= 3d

# daf_windows_nsclient_config detected this server is not a clusternode.

[/settings/external scripts]
allow arguments = true
allow nasty characters = true
timeout = 600

[/settings/external scripts/alias]

[/settings/external scripts/scripts]

[/settings/external scripts/wrapped scripts]
check_ms_cluster_preferred_node             = check_ms_cluster_preferred_node.ps1
check_ms_ctx_loadevaluator                  = check_ms_ctx_loadevaluator.ps1
check_ms_exchange_2010_health               = check_ms_exchange_2010_health.ps1
check_ms_exchange_2010_hybrid               = check_ms_exchange_2010_hybrid.ps1
check_ms_exchange_2010_replication          = check_ms_exchange_2010_replication.ps1
check_ms_sharepoint_2010_connections        = check_ms_sharepoint_2010_connections.ps1
check_ms_sharepoint_2010_sitecollections    = check_ms_sharepoint_2010_sitecollections.ps1
check_ms_sharepoint_health                  = check_ms_sharepoint_health.ps1
check_ms_win_certificates                   = check_ms_win_certificates.ps1
check_ms_win_disk_load                      = check_ms_win_disk_load.ps1
check_ms_win_network_connections            = check_ms_win_network_connections.ps1
check_ms_win_network_load                   = check_ms_win_network_load.ps1
check_ms_win_tasks                          = check_ms_win_tasks.ps1
check_ms_win_updates                        = check_ms_win_updates.ps1

daf_citrix_drain_server                     = daf_citrix_drain_server.ps1
daf_pki_certificate                         = daf_pki_certificate.ps1
daf_windows_audit                           = daf_windows_audit.ps1
daf_windows_information                     = daf_windows_information.ps1
daf_windows_wsus                            = daf_windows_wsus.ps1
daf_windows_robocopy                        = daf_windows_robocopy.ps1
daf_windows_service                         = daf_windows_service.ps1
daf_windows_software_cylance                = daf_windows_software_cylance.ps1
daf_windows_software_msi                    = daf_windows_software_msi.ps1

[/settings/external scripts/wrappings]
bat = scripts\\%SCRIPT% %ARGS%
vbs = cscript.exe //T:30 //NoLogo scripts\\lib\\wrapper.vbs %SCRIPT% %ARGS%
ps1 = cmd /c echo If (-Not (Test-Path "scripts\powershell\%SCRIPT%") ) { Write-Host "UNKNOWN: Script `"%SCRIPT%`" not found."; exit(3) }; scripts\powershell\%SCRIPT% $ARGS$; exit($lastexitcode) | powershell.exe /noprofile -command -

[/settings/graphite/client]
channel = GRAPHITE
hostname = auto

[/settings/graphite/client/targets/default]
path = system.${hostname}.${check_alias}.${perf_alias}

[/settings/log]
date format = %Y-%m-%d %H:%M:%S
file name = ${exe-path}/nsclient.log
level = info

[/settings/log/file]
max size = 2048000

[/settings/logfile]

[/settings/logfile/real-time]
enabled = 0

[/settings/logfile/real-time/checks]

[/settings/NSCA/client]
channel = NSCA
hostname = servername

[/settings/NSCA/client/targets/default]
address = <ip>
allowed ciphers = ADH
certificate = 
encryption = none
password = pass
timeout = 30
use ssl = 0
verify mode = none

[/settings/NSCA/server]
port = 5667
performance data = 1
use ssl = 0
encryption = aes
payload length = 512

[/settings/NSClient/server]
performance data = true
port = 12489
use ssl = 0

[/settings/NRDP/client]
channel = NRDP
hostname = auto

[/settings/NRDP/client/targets/default]
sender = nscp@localhost
recipient = nscp@localhost
timeout = 30
template = Hello, this is %source% reporting %message%!

[/settings/NRPE/client]
channel = NRPE

[/settings/NRPE/client/targets/default]
timeout = 30
verify mode = none
payload length = 1024
use ssl = 1

[/settings/NRPE/server]
allow arguments = true
allow nasty characters = true
extended response = false
insecure = true 
port = 5666
ssl options = no-sslv2,no-sslv3
timeout = 600
verify mode = none
payload length = 10240
use ssl = 1

[/settings/python]

[/settings/python/scripts]

[/settings/scheduler]
threads = 5

[/settings/scheduler/schedules]

[/settings/shared session]
enabled = 0

[/settings/SMTP/client]
channel = SMTP

[/settings/SMTP/client/targets/default]
sender = nscp@localhost
template = Hello, this is %source% reporting %message%!
timeout = 30
recipient = nscp@localhost

[/settings/syslog/client]
channel = syslog

[/settings/syslog/client/targets/default]
warning severity = warning
tag_syntax = NSCA
severity = error
ok severity = informational
message_syntax = %message%
facility = kernel
critical severity = critical
unknown severity = emergency

[/settings/system/windows]
default buffer length = 1h

[/settings/system/windows/counters]

[/settings/system/windows/service mapping]

[/settings/system/windows/real-time]

[/settings/system/windows/real-time/checks]

[/settings/targets]

[/settings/WEB/server]
port = 8443s
certificate = ${certificate-path}/certificate.pem

[/settings/writers/file]
syntax = ${alias-or-command} ${result} ${message}
file = output.txt
channel = FILE
willemdh commented 7 years ago

Ok, I have thoroughly analyzed this problem by enabling modules one by one and it seems the problem is in the CheckEventLog module. When this module is disabled everything works ok. When NSCLient++ is stopped and I enable this CheckEventLog module, it's possible to start the service. But afterwards it's no longer possible to stop / restart the service.

I'm still seeing these errors while the service is trying to stop:

2017-03-20 14:57:51: error:c:\source\master\service\NSClient++.cpp:935: Failed to process command: Library is not loaded
2017-03-20 14:57:51: error:c:\source\master\include\nscapi\nscapi_core_wrapper.cpp:191: Failed to execute query
2017-03-20 14:57:51: error:c:\source\master\service\NSClient++.cpp:935: Failed to process command: Library is not loaded
2017-03-20 14:57:51: error:c:\source\master\include\nscapi\nscapi_core_wrapper.cpp:191: Failed to execute query

Please let me know if I can do anything else to help you make this issue go away.

mickem commented 7 years ago

I can see if I can dig up a dutch windows version, do you have some specifics for what I should install to see if I can reproduce it?

And it is consistent on multiple machines?

The errors gives me the impression that it could be some system library which is missing as the error is not coming from a module but the "core"...

willemdh commented 7 years ago

@mickem Yes I tested on two different Dutch Windows 2008 R2 SP1 servers. I don't have any 2012 Dutch Windows servers at my disposal, so can't test that. Sorry.

willemdh commented 7 years ago

@mickem

I did some more testing and it seems the issue is in the [/settings/eventlog/real-time] settings. When I set enabled to false I can restart the service.

So I was thinking, maybe the new real-time eventlog monitoring (upgrade since 0.4.1.1.05) is no longer able to handle English eventlog names. Tried to change

log = application,system to log = toepassing,systeem (Dutch eventlog names)

Also tried to update the severity labels to Dutch names in the real-time filters:

filter= level IN (error) to filter= level IN (fout)

But the problem persists the moment enabled is set to true.

So to answer your question

do you have some specifics for what I should install to see if I can reproduce it?

Maybe you could first try to use the real-time eventlog monitoring like I did, but in a Swedish version, maybe the issue is in all non-English Windows versions?

mickem commented 7 years ago

First off: seriously... how the %&!#¤ can you spell Sweden with a Z? Thats just evil... took me ages to find a Swedish keyboard, had to go through the list line by line reading the English name in the parentheses until I found it on the very last line... :)

Anyways, I can confirm it is event-log module but unrelated to language instead I would be it is the length of your queries which is the culprit...

The following config

[/modules]
CheckEventLog = 1

[/settings/eventlog]
buffer size = 131072
debug = 1
lookup names = 1

[/settings/eventlog/real-time]
debug = 0
enabled = true
log = application,system
startup age = 30m

[/settings/eventlog/real-time/filters/default]
destination=NSCA
maximum age= 3d
ok message= eventlog found no records test default
syntax=%type% %id% %source%: %message% 

[/settings/eventlog/real-time/filters/EVT_Application]
log= application
filter= level IN (error) AND (id NOT IN (10,12,13,23,26,33,37,38,58,67,101,103,104,107,108,110,112,274,502,511,1000,1002,1004,1005,1009,1010,1026,1027,1053,1054,1085,1101,1107,1116,1301,1325,1334,1373,1500,1502,1504,1508,1511,1515,1521,1533,1542,2019,2158,2636,2670,3001,3008,3012,3021,3032,3037,3042,3077,3079,3098,3119,3130,3131,3148,3159,4005,4102,4237,4621,5008,5009,5051,5124,5133,5605,5705,6032,6100,7043,7363,7735,7823,7827,7833,8193,8194,8196,8313,9001,10000,10005,10007,10862,10922,11317,12121,12289,12298,12321,13793,13836,14197,14204,15000,16038,16041,16053,16063,16066,16068,16195,16391,16418,16419,16421,17187,17192,17204,17412,17898,18176,19269,19458,19954,19969,19972,20958,21061,22670,35698,35705,35710,35712,35716,35721,35726,37088,37090,37092,37095,37098,37119,37124,37225)) AND (id NOT IN (5) OR source NOT IN ('Microsoft-Windows-CAPI2')) AND (id NOT IN (4101) OR source NOT IN ('Microsoft-Windows-CAPI2')) AND (id NOT IN (1023) OR source NOT IN ('Perflib')) AND (id NOT IN (6096) OR source NOT IN ('Application Virtualization Server')) AND (id NOT IN (2004) OR source NOT IN ('PerfNet')) AND (id NOT IN (20) OR source NOT IN ('Therefore')) AND (id NOT IN (1008) OR source NOT IN ('Microsoft-Windows-Perflib')) AND (id NOT IN (3038) OR source NOT IN ('Application Virtualization Server')) AND (id NOT IN (80) OR source NOT IN ('Application Virtualization Server')) AND (id NOT IN (20) OR source NOT IN ('Application Virtualization Server')) AND (id NOT IN (0) OR source NOT IN ('OWSClient')) AND (id NOT IN (6007) OR source NOT IN ('Application Virtualization Client')) AND (id NOT IN (6016) OR source NOT IN ('Application Virtualization Client')) AND (id NOT IN (6544) OR source NOT IN ('Goverlan')) AND (id NOT IN (1509) OR source NOT IN ('Userenv')) AND (id NOT IN (1055) OR source NOT IN ('Userenv')) AND (id NOT IN (1030) OR source NOT IN ('Userenv')) AND (id NOT IN (1006) OR source NOT IN ('Userenv')) 
severity= WARNING
ok message= Autoreset, found no records in application eventlog
maximum age= 3d

gave me:

image

But if you remove the last and stansa (AND (id NOT IN (1006) OR source NOT IN ('Userenv')) ) it works...

mickem commented 7 years ago

This might fix this issue: https://github.com/mickem/nscp/releases/tag/0.5.1.21 I need to investigate if it breaks something as well, but please verify that it fixes the issue...

willemdh commented 7 years ago

Thank Michael, I tried to upgrade the agent and it seems I was not able to uninstall the old NSClient (0.5.0.62) The software is not listed in software in Control Panel... Weird.. I'll have to investigate further, but I'm out of time for today. To be continued...

willemdh commented 7 years ago

Michael, tested on 8 Windows servers and although most monitoring functionalities seem to work as expected, I do have the following error during manual installation:

image

I see this both on typical, custom and complete installation on both Windows 2008 R2 as 2012 R2 servers.

It seems I'm able now to restart the nscp service and i tested some real-time eventlog exclusions which also seem to work as expected. So I'll close this issue. I'm sure you know what to do with the installation error..?

Thanks.