Open sowmyav27 opened 3 years ago
Reproduced in the latest Monitoring chart. Screenshots and logs attached below for investigation.
Seems like an issue with the wins cli proxy
command on the wins server side. Filing a related issue with rancher/wins to track.
Redeploying the wins client with windows-exporter itself doesn't seem to resolve the issue as it produces two more of the same exact logs:
Handling backend connection request [rancher-monitoring-windows-exporter-5q4nz]
error in remotedialer server [500]: connect not allowed
Detected wins version on host is v0.1.0, which is >v0.1.0. Continuing with installation...
time="2021-05-05T01:39:04Z" level=warning msg="No where-clause specified for service collector. This will generate a very large number of metrics!" source="service.go:41"
time="2021-05-05T01:39:04Z" level=error msg="Failed to start service: The service process could not connect to the service controller." source="exporter.go:350"
time="2021-05-05T01:39:04Z" level=info msg="Enabled collectors: system, cpu, net, os, logical_disk, tcp, container, service, cs, memory" source="exporter.go:360"
time="2021-05-05T01:39:04Z" level=info msg="Starting windows_exporter (version=0.15.0, branch=master, revision=cdbb27d0b4ea9810dc35035fad281fe6468b7dd1)" source="exporter.go:412"
time="2021-05-05T01:39:04Z" level=info msg="Build context (go=go1.15.3, user=appveyor-vm\\appveyor@appveyor-vm, date=20201107-08:23:37)" source="exporter.go:413"
time="2021-05-05T01:39:04Z" level=info msg="Starting server on :9796" source="exporter.go:416"
INFO[2021-05-05T01:40:36Z] Connecting to proxy url="ws://rancher_wins_proxy"
PS C:\Users\Administrator> Get-EventLog -LogName Application -Source rancher-wins -ErrorAction Ignore | Sort-Obj
ect Index | %{ $_.Message }
Stackdump - waiting signal at Global\stackdump-3592
Listening on \\.\pipe\rancher_wins_proxy
Listening on \\.\pipe\rancher_wins
currentVersion.Major > versionRange.MaxVersion.Major: 11, 9
currentVersion.Major > versionRange.MaxVersion.Major: 11, 9
currentVersion.Major < versionRange.MinVersion.Major: 11, 12
currentVersion.Major > versionRange.MaxVersion.Major: 11, 10
currentVersion.Major < versionRange.MinVersion.Major: 11, 12
currentVersion.Major < versionRange.MinVersion.Major: 11, 13
currentVersion.Major > versionRange.MaxVersion.Major: 11, 9
currentVersion.Major > versionRange.MaxVersion.Major: 11, 10
currentVersion.Minor < versionRange.MinVersion.Major: 10, 11
currentVersion.Major < versionRange.MinVersion.Major: 11, 12
currentVersion.Major < versionRange.MinVersion.Major: 11, 13
currentVersion.Major < versionRange.MinVersion.Major: 11, 13
currentVersion.Major > versionRange.MaxVersion.Major: 11, 9
currentVersion.Major > versionRange.MaxVersion.Major: 11, 9
currentVersion.Major < versionRange.MinVersion.Major: 11, 12
currentVersion.Major > versionRange.MaxVersion.Major: 11, 10
currentVersion.Major < versionRange.MinVersion.Major: 11, 12
currentVersion.Major < versionRange.MinVersion.Major: 11, 13
currentVersion.Major > versionRange.MaxVersion.Major: 11, 9
currentVersion.Major > versionRange.MaxVersion.Major: 11, 10
currentVersion.Minor < versionRange.MinVersion.Major: 10, 11
currentVersion.Major < versionRange.MinVersion.Major: 11, 12
currentVersion.Major < versionRange.MinVersion.Major: 11, 13
currentVersion.Major < versionRange.MinVersion.Major: 11, 13
could not get checksum for "c:\\etc\\rancher\\wins\\wins.exe": open c:\etc\rancher\wins\wins.exe: The process ca
nnot access the file because it is being used by another process.
could not get checksum for "c:\\etc\\rancher\\wins\\wins.exe": open c:\etc\rancher\wins\wins.exe: The process ca
nnot access the file because it is being used by another process.
Handling backend connection request [rancher-monitoring-windows-exporter-ldck9]
error in remotedialer server [500]: connect not allowed
PS C:\Users\Administrator> (get-childitem \\.\pipe\).FullName
... (omitted) ...
\\.\pipe\rancher_wins
\\.\pipe\rancher_wins_proxy
... (omitted) ...
Just deploying rancher-wins-upgrader (e.g. re-initializing the wins service) seems to be an effective workaround to this issue.
I'm not sure whether this is because the fix in wins v0.1.1
somehow resolves this bug (doubtful) or whether the re-initialization of wins is what fixes the issue, since that would cause the named pipe + GRPC server + network configuration of the host to be re-initialized.
@sowmyav27 once rc19 is cut with wins v0.1.1, can you retest this issue to see if that resolves it?
@sowmyav27 & @aiyengar2 - We are doing some triage right now of issues in 2.6. Would you be able to give us more information about this? Is this fixed in the latest RC? And Arvind, how does the workaround look as a viable option? (You mentioned it was a possible workaround).
@Jono-SUSE-Rancher I don't believe this is fixed in the latest RC.
The core problem here seems to be that a Windows cluster without rancher-wins-upgrader deployed that mounts resources on a prefixPath (e.g. c:\host\opt
; this is specified as part of the RKE1 config) does not seem to be able to accept proxy connections via the Named Pipe mounted at \\.\pipe\rancher_wins_proxy
.
This issue appears to be resolved when the wins service is restarted and/or the wins config is refreshed, which is exactly what happens when you deploy rancher-wins-upgrader
.
I'm not sure why this restart is required so this needs to be investigated. The problem could be with the way we do bootstrapping on Windows nodes (e.g. how we set up the config + service) or could require cutting a new wins release. Either way, this would be a Windows issue that is not particular to Monitoring (cc: @sirredbeard ).
Currently, only Monitoring is impacted since only monitoring uses wins cli proxy
, but I believe there are conversations about using that feature in other Windows components (cc: @rosskirkpat), so this does need to be eventually prioritized.
However, if we cannot prioritize this in 2.6, the workaround of expecting rancher-wins-upgrader to be deployed onto Windows clusters with prefixPath enabled sounds like a viable option to me. I think we should encourage customers to start using it anyways so that they can have declarative wins configs (i.e. an expectation that the upgrader chart exists would allow us to more easily cut wins releases in the future, if we need to add security fixes, golang bumps, or new features). @luthermonson @sirredbeard any thoughts here?
Either way, if we prioritize the workaround, I think we should ensure that it is tested rigorously to ensure that we don't miss anything before suggesting it as the official solution to this issue.
@sowmyav27 @aiyengar2 Could this be related to the fact that no metrics are available in grafana for k8s 1.21?
@deniseschannon that should be unrelated. https://github.com/rancher/rancher/issues/33465 is Monitoring V1; this is Monitoring V2.
What kind of request is this (question/bug/enhancement/feature request): bug
Steps to reproduce (least amount of steps as possible): on 2.5.8-rc18
Down
Expected Result: Metrics from windows nodes should be available in Grafana.
Other details that may be helpful:
Environment information
rancher/rancher
/rancher/server
image tag or shown bottom left in the UI): 2.5.8-rc18