microsoft / windows-container-tools

Collection of tools to improve the Windows Containers experience
MIT License
236 stars 66 forks source link

[BUG] ][LOGMONITOR] ERROR: Failed to query next event. Error: 21. #163

Closed israelvaldez closed 6 months ago

israelvaldez commented 8 months ago

Describe the bug Suddenly some of my containers present the following error, which leaves them hanging on forever in kubernetes.

To Reproduce Steps to reproduce the behavior:

  1. Run a container with the following configuration of entrypoint:
    SHELL ["cmd", "/S", "/C"]
    ENTRYPOINT ["cmd", "/S", "/C", "powershell","-command", "./Scripts/Run.ps1", "&&", "C:/LogMonitor/LogMonitor.exe", "ServiceMonitor.exe", "w3svc"]
  2. With a configuration for logmonitor.exe being the following:
    
    {
    "LogConfig": {
        "sources": [{
                "eventFormatMultiLine": false,
                "channels": [{
                        "level": "error",
                        "name": "system"
                    }, {
                        "level": "Error",
                        "name": "application"
                    }
                ],
                "type": "EventLog",
                "startAtOldestRecord": true
            }, {
                "filter": "*.log",
                "includeFileNames": true,
                "directory": "c:\\inetpub\\logs",
                "includeSubdirectories": true,
                "type": "File"
            }, {
                "filter": "*.txt",
                "includeFileNames": true,
                "directory": "C:\\epi\\logs",
                "includeSubdirectories": true,
                "type": "File"
            }
        ]
    }
    }
5. Suddenly at some point the pod stops running with the following errors

2023-10-31T01:10:57.842902400Z [2023-10-31T01:10:57.000Z][LOGMONITOR] ERROR: Failed to query next event. Error: 21. 2023-10-31T01:10:57.842902400Z [2023-10-31T01:10:57.000Z][LOGMONITOR] ERROR: Failed to start event log monitor. Error: 21 2023-11-01T12:29:07.102387200Z [2023-11-01T12:29:07.000Z][LOGMONITOR] ERROR: Error monitoring log directory. Invalid directory change notification data. 2023-11-01T12:29:07.102467400Z [2023-11-01T12:29:07.000Z][LOGMONITOR] ERROR: Error monitoring log directory. Invalid directory change notification data.



**Expected behavior**
Logmonitor.exe should exit with code 1 if there's a catastrophic error or it should report something that we can act on to correct the issue.

**Configuration**
Specified in steps to reproduce

**Additional context**
I collected memory dumps of the process as it was hanging, if you need me to send them I can send it to you independently via email.
iankingori commented 8 months ago

Hey @israelvaldez, which version of Log Monitor are you running?

israelvaldez commented 8 months ago

According to the .exe file version it is 2.0.0 image

iankingori commented 8 months ago

Thanks, and which base image are you using?

israelvaldez commented 8 months ago

mcr.microsoft.com/dotnet/framework/wcf:4.8-windowsservercore-ltsc2019

iankingori commented 8 months ago

We'll try to repro this, we had a similar issue on the 2019 dotnet images: https://github.com/microsoft/windows-container-tools/issues/150. It was fixed in version 2.0.2. This seems to be impacting file monitor as well which is strange.

bobsira commented 8 months ago

Hiya @israelvaldez can you confirm if these errors appear on the latest version of Log Monitor that Ian pointed out. v2.0.2.

I tried repro it on my end and I did not see any regression.

===these are the logs I could observe

[2023-11-05T23:17:13.000Z][LOGMONITOR] INFO: Log directory \\?\C:\epi\logs does not exist. LogMonitor will wait for 300 seconds for the directory to be created.
[2023-11-05T23:17:13.000Z][LOGMONITOR] INFO: Waiting for system event channel to be enabled
[2023-11-05T23:17:28.000Z][LOGMONITOR] INFO: Enabled system event channel after 0 seconds.
{"Source": "EventLog","LogEntry": {"Time": "2023-10-10T19:35:59.000Z","Channel": "System","Level": "Error","EventId": 7023,"Message": "The Server service terminated with the following error: \r\nThe system cannot find the file specified."}}
{"Source": "EventLog","LogEntry": {"Time": "2023-10-10T19:35:59.000Z","Channel": "System","Level": "Error","EventId": 701,"Message": "Task Scheduler service failed to start Task Compatibility module. Tasks may not be able to register on previous Window versions. Additional Data: Error Value: 2147942450."}}
{"Source": "EventLog","LogEntry": {"Time": "2023-10-10T19:35:59.000Z","Channel": "System","Level": "Error","EventId": 701,"Message": "Task Scheduler service failed to start Task Compatibility module. Tasks may not be able to register on previous Window versions. Additional Data: Error Value: 2147942450."}}
{"Source": "EventLog","LogEntry": {"Time": "2023-10-10T19:53:34.000Z","Channel": "System","Level": "Error","EventId": 701,"Message": "Task Scheduler service failed to start Task Compatibility module. Tasks may not be able to register on previous Window versions. Additional Data: Error Value: 2147942450."}}
{"Source": "EventLog","LogEntry": {"Time": "2023-10-10T19:53:34.000Z","Channel": "System","Level": "Error","EventId": 701,"Message": "Task Scheduler service failed to start Task Compatibility module. Tasks may not be able to register on previous Window versions. Additional Data: Error Value: 2147942450."}}
{"Source": "EventLog","LogEntry": {"Time": "2023-10-10T19:53:34.000Z","Channel": "System","Level": "Error","EventId": 7023,"Message": "The Server service terminated with the following error: \r\nThe system cannot find the file specified."}}
{"Source": "EventLog","LogEntry": {"Time": "2023-10-10T19:55:53.000Z","Channel": "System","Level": "Error","EventId": 701,"Message": "Task Scheduler service failed to start Task Compatibility module. Tasks may not be able to register on previous Window versions. Additional Data: Error Value: 2147942450."}}
{"Source": "EventLog","LogEntry": {"Time": "2023-10-10T19:55:53.000Z","Channel": "System","Level": "Error","EventId": 701,"Message": "Task Scheduler service failed to start Task Compatibility module. Tasks may not be able to register on previous Window versions. Additional Data: Error Value: 2147942450."}}
{"Source": "EventLog","LogEntry": {"Time": "2023-10-10T19:55:53.000Z","Channel": "System","Level": "Error","EventId": 7023,"Message": "The LanmanServer service terminated with the following error: \r\nThe system cannot find the file specified."}}
{"Source": "EventLog","LogEntry": {"Time": "2023-10-10T19:56:28.000Z","Channel": "System","Level": "Error","EventId": 701,"Message": "Task Scheduler service failed to start Task Compatibility module. Tasks may not be able to register on previous Window versions. Additional Data: Error Value: 2147942450."}}
{"Source": "EventLog","LogEntry": {"Time": "2023-10-10T19:56:28.000Z","Channel": "System","Level": "Error","EventId": 701,"Message": "Task Scheduler service failed to start Task Compatibility module. Tasks may not be able to register on previous Window versions. Additional Data: Error Value: 2147942450."}}
{"Source": "EventLog","LogEntry": {"Time": "2023-10-10T19:56:28.000Z","Channel": "System","Level": "Error","EventId": 7023,"Message": "The LanmanServer service terminated with the following error: \r\nThe system cannot find the file specified."}}

I retained the JsonConfig file entries you shared and this is the context of my docker file to keep things simple

FROM mcr.microsoft.com/dotnet/framework/wcf:4.8-windowsservercore-ltsc2019 

WORKDIR /LogMonitor
COPY LogMonitorConfig.json .
COPY LogMonitor.exe .

ENTRYPOINT ["C:\\LogMonitor\\LogMonitor.exe", "C:\\ServiceMonitor.exe", "w3svc"]
lucillex commented 8 months ago

Hi @israelvaldez, would you be able to send the memory dumps collected to WindowsContainerGitHubIssues@service.microsoft.com?

israelvaldez commented 8 months ago

Sent.

israelvaldez commented 8 months ago

Seems WindowsContainerGitHubIssues@service.microsoft.com is not setup correctly as I recieved a message from postmaster@microsoft.com saying that address isn't setup to recieve messages from my email address.

Your message to the Microsoft 365 group WindowsContainerGitHubIssues@service.microsoft.com couldn't be delivered.

The group WindowsContainerGitHubIssues isn't set up to receive messages from .

Office 365  WindowsContainerGitH. . . 
Sender      Action Required 

        Sender not allowed 
lucillex commented 8 months ago

Would you be able to try again? There was a policy preventing external emails but that's fixed now.

bobsira commented 7 months ago

Hello @israelvaldez, we are working on how to handle catastrophic failures in Log Monitor. We've not been able to repro the issue on our end. Could you please share your Docker file? Thanks!

israelvaldez commented 7 months ago

@bobsira I can share the content of the powershell scripts via email so to not share in here and try to keep the NDA my company has with Microsoft, can you please let me know an email where I can send the files?

Today we also got a similar error that ultimately caused that one of our pods to be killed after hours of the following messages in the logs, are these raised on this issue?

[2023-12-07T19:07:32.000Z][LOGMONITOR] ERROR: Failed to query next event. Error: 21.
[2023-12-07T19:07:32.000Z][LOGMONITOR] ERROR: Failed to start event log monitor. Error: 21
[2023-12-07T14:24:56.000Z][LOGMONITOR] ERROR: Error monitoring log directory. Invalid directory change notification data.
[2023-12-07T13:05:29.000Z][LOGMONITOR] ERROR: Error monitoring log directory. Invalid directory change notification data.
[2023-12-07T11:47:31.000Z][LOGMONITOR] ERROR: Error monitoring log directory. Invalid directory change notification data.
[2023-12-07T10:27:48.000Z][LOGMONITOR] ERROR: Error monitoring log directory. Invalid directory change notification data.
[2023-12-07T09:10:05.000Z][LOGMONITOR] ERROR: Error monitoring log directory. Invalid directory change notification data.
lucillex commented 7 months ago

@israelvaldez please send the files to WindowsContainerGitHubIssues@service.microsoft.com