poshbotio / PoshBot

Powershell-based bot framework
MIT License
538 stars 108 forks source link

PoshBot seems to be running a job that uses high CPU #203

Closed pauby closed 4 years ago

pauby commented 4 years ago

I'm not actually sure what's happening this one so I'm raising the issue to see if I can get more info on how to troubleshoot it and confirm it's related to PoshBot.

I recently deployed the commit from this issue. Ever since then, at approximately the same time (within a couple of hours) a PowerShell session get's spawned that consumes a high amount of CPU. Our monitoring system then is alerting us that the CPU is being pegged at 100%. When I log in I see the following:

image

Some things to note:

  1. The job is running under the same user as PoshBot (which runs under it's own username and only Poshbot 'stuff' runs under it);
  2. PoshBot runs in a PowerShell session and not as a service. The session does not have administrator privileges;
  3. There are no scheduled tasks set to run under the same account as PoshBot;
  4. This has only started happening since I deployed the commit from this issue;
  5. I'm using PoshBot with Slack exclusively;

Now this could just be any PowerShell session, however when I terminate that process I get this in PoshBot:

WARNING: {"DataTime":"2020-02-14
11:19:39Z","Class":"SlackConnection","Method":"ReadReceiveJob","Severity":"Warning","LogLevel":"Info","Message":"Receive job state is [Failed]. Attempting to reconnect...","Data":{}}

So it looks like a job running in PoshBot. However when I run get-schedule for PoshBot I get this:

image

The two jobs that run every day both run without problems. So it's not them. The 30 day one isn't due to run yet.

There is little in the PoshBot logs, however I do notice this at approximately the time the alerts come through about high CPU:

{"DataTime":"2020-02-13 18:18:53Z","Class":"SlackBackend","Method":"ReceiveMessage","Severity":"Normal","LogLevel":"Debug","Message":"Received message","Data":"{\"type\": \"goodbye\", \"source\": \"gateway_server\"}"}
{"DataTime":"2020-02-13 18:18:53Z","Class":"SlackBackend","Method":"ReceiveMessage","Severity":"Normal","LogLevel":"Debug","Message":"Message type is [goodbye]. Ignoring","Data":{}}

And finally, in the event logs of the server I get this at 18:18:15:

The WMI Performance Adapter service entered the running state.

and at 18:24:16:

The WMI Performance Adapter service entered the stopped state.

I only add the event logs as I've seen spike when WMI misbehaves but I'm not sure if this is also a red herring.

The logs are at 'DEBUG' level and there is not much in there. So I'm looking to find out if:

  1. The job that is running is for PoshBot (looks like it is);
  2. How to identify if this is an internal PoshBot job that's gone rogue or something else;
  3. Something else?!

The only step left to me was to stop PoshBot from running and to see if that session runs again.

(apologies I didn't complete the issue template as this was much more than that).

Your Environment

devblackops commented 4 years ago

This looks like the PowerShell job that PoshBot spawns to connect to Slack's RTM API over websockets. This job should always be running. PoshBot will start it when it connects to Slack and continually reads the output stream from it.

Looks like you're receiving this messages from Slack:

{"DataTime":"2020-02-13 18:18:53Z","Class":"SlackBackend","Method":"ReceiveMessage","Severity":"Normal","LogLevel":"Debug","Message":"Received message","Data":"{\"type\": \"goodbye\", \"source\": \"gateway_server\"}"}

https://api.slack.com/events/goodbye

We're not handling that message type in the Slack backend so it sounds like Slack is closing the connection but we don't have logic to handle that correctly.

Since it isn't a known (to PoshBot) message type we're ignoring it. https://github.com/poshbotio/PoshBot/blob/master/PoshBot/Implementations/Slack/SlackBackend.ps1#L347

Instead, we should probably add some proactive reconnect logic.

Thanks for the detailed issue @pauby. This helps a lot!

pauby commented 4 years ago

Thanks @devblackops. I'm assuming there is no workaround for it just now? The only solution I've implemented is to turn off PoshBot and that is not sustainable 😄

devblackops commented 4 years ago

Correct. No solution right now but I'll see how easy it will be to properly handly the goodbye event and do a reconnect.

pauby commented 4 years ago

I really need get time to dig into the code and help out!

wozzo commented 4 years ago

We seem to have experienced this a couple of times over the weekend since the 0.12.0 release if that helps at all.

pauby commented 4 years ago

@devblackops The comment by @wozzo seems to indicate this is something that's been changed in 0.12.0.

tenwiseman commented 4 years ago

Hmmm, I'm now getting same message as well. (0.12.0, PS 5.1, Server 2012/R2 and 2016). Service restarts OK. I can supply more diags info if it helps (or should I create a new issue ticket?)

{"DataTime":"2020-03-09 19:25:30Z","Class":"SlackBackend","Method":"ReceiveMessage","Severity":"Normal","LogLevel":"Debug","Message":"Received message","Data":"{\"type\": \"goodbye\", \"source\": \"gateway_server\"}"} {"DataTime":"2020-03-09 19:25:30Z","Class":"SlackBackend","Method":"ReceiveMessage","Severity":"Normal","LogLevel":"Debug","Message":"Message type is [goodbye]. Ignoring" ,"Data":{}}

devblackops commented 4 years ago

@pauby @tenwiseman About how often is the goodbye event received?

donfouts commented 4 years ago

I think this is also the problem i am having: it started 3/4 and I was using poshbot 0.11.8 but it looked like we were getting rate limited by slack's api "too many requests", I upgraded poshbot to 0.12.0 - and now it looks like a bad token from slack (it might have been changed to troubleshoot last week, but didn't get updated for the next release. below are the logs (well part of it)

today I did notice the ramp up of resource consumption: and multiple spin-off powershell tasks, mentioned by @devblackops

{"DataTime":"2020-03-09 19:43:32Z","Class":"SlackConnection","Method":"RtmConnect","Severity":"Error","LogLevel":"Info","Message":"Error connecting to Slack Real Time API","Data":{"CommandName":null,"Message":"@{ok=False; error=invalid_auth}","TargetObject":{"ok":false,"error":"invalid_auth"},"Position":"At C:\\Program Files\\WindowsPowerShell\\Modules\\PoshBot\\0.12.0\\PoshBot.psm1:8745 char:17\r\n+                 throw $r\r\n+                 ~~~~~~~~","CategoryInfo":"OperationStopped: (@{ok=False; error=invalid_auth}:PSObject) [], RuntimeException","FullyQualifiedErrorId":"@{ok=False; error=invalid_auth}"}}
{"DataTime":"2020-03-09 19:43:32Z","Class":"SlackConnection","Method":"StartReceiveJob","Severity":"Normal","LogLevel":"Info","Message":"Started websocket receive job [387]","Data":{}}  {"DataTime":"2020-03-09 19:43:34Z","Class":"SlackConnection","Method":"ReadReceiveJob","Severity":"Warning","LogLevel":"Info","Message":"Receive job state is [Blocked]. Attempting to reconnect...","Data":{}}
{"DataTime":"2020-03-09 19:43:39Z","Class":"SlackConnection","Method":"RtmConnect","Severity":"Error","LogLevel":"Info","Message":"Error connecting to Slack Real Time API","Data":{"CommandName":null,"Message":"@{ok=False; error=invalid_auth}","TargetObject":{"ok":false,"error":"invalid_auth"},"Position":"At C:\\Program Files\\WindowsPowerShell\\Modules\\PoshBot\\0.12.0\\PoshBot.psm1:8745 char:17\r\n+                 throw $r\r\n+                 ~~~~~~~~","CategoryInfo":"OperationStopped: (@{ok=False; error=invalid_auth}:PSObject) [], RuntimeException","FullyQualifiedErrorId":"@{ok=False; error=invalid_auth}"}}
{"DataTime":"2020-03-09 19:43:39Z","Class":"SlackConnection","Method":"StartReceiveJob","Severity":"Normal","LogLevel":"Info","Message":"Started websocket receive job [389]","Data":{}}  {"DataTime":"2020-03-09 19:43:41Z","Class":"SlackConnection","Method":"ReadReceiveJob","Severity":"Warning","LogLevel":"Info","Message":"Receive job state is [Blocked]. Attempting to reconnect...","Data":{}}
{"DataTime":"2020-03-09 19:43:46Z","Class":"SlackConnection","Method":"RtmConnect","Severity":"Error","LogLevel":"Info","Message":"Error connecting to Slack Real Time API","Data":{"CommandName":null,"Message":"@{ok=False; error=invalid_auth}","TargetObject":{"ok":false,"error":"invalid_auth"},"Position":"At C:\\Program Files\\WindowsPowerShell\\Modules\\PoshBot\\0.12.0\\PoshBot.psm1:8745 char:17\r\n+                 throw $r\r\n+                 ~~~~~~~~","CategoryInfo":"OperationStopped: (@{ok=False; error=invalid_auth}:PSObject) [], RuntimeException","FullyQualifiedErrorId":"@{ok=False; error=invalid_auth}"}}
{"DataTime":"2020-03-09 19:43:46Z","Class":"SlackConnection","Method":"StartReceiveJob","Severity":"Normal","LogLevel":"Info","Message":"Started websocket receive job [391]","Data":{}}  {"DataTime":"2020-03-09 19:43:48Z","Class":"SlackConnection","Method":"ReadReceiveJob","Severity":"Warning","LogLevel":"Info","Message":"Receive job state is [Blocked]. Attempting to reconnect...","Data":{}}

my psversion table

Name Value


PSVersion 5.1.14393.3471 PSEdition Desktop PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...} BuildVersion 10.0.14393.3471 CLRVersion 4.0.30319.42000 WSManStackVersion 3.0 PSRemotingProtocolVersion 2.3 SerializationVersion 1.1.0.1

tenwiseman commented 4 years ago

Hi Brandon, I'd only just changed the logging level to 'debug' for the last run, which stayed functional for 8hours. Previously it stayed functional for about an hour, before the bot stopped responding to commands, clearing it's 'presence' beacon in the client (presumably before it says 'goodbye'), and the socket connection disappears from netstat. I haven't see the excessive CPU load issues others have mentioned though, but my current (test) instance is just one slack user and a couple of bots on different machines. Thanks :)

pauby commented 4 years ago

@devblackops Here is the log I have for a problem day:

{"DataTime":"2020-02-13 09:57:55Z","Class":"SlackBackend","Method":"ReceiveMessage","Severity":"Normal","LogLevel":"Debug","Message":"Received message","Data":"{\"type\": \"hello\"}"}
{"DataTime":"2020-02-13 09:57:55Z","Class":"SlackBackend","Method":"ReceiveMessage","Severity":"Normal","LogLevel":"Debug","Message":"Message type is [hello]. Ignoring","Data":{}}
{"DataTime":"2020-02-13 18:18:53Z","Class":"SlackBackend","Method":"ReceiveMessage","Severity":"Normal","LogLevel":"Debug","Message":"Received message","Data":"{\"type\": \"goodbye\", \"source\": \"gateway_server\"}"}
{"DataTime":"2020-02-13 18:18:53Z","Class":"SlackBackend","Method":"ReceiveMessage","Severity":"Normal","LogLevel":"Debug","Message":"Message type is [goodbye]. Ignoring","Data":{}}
{"DataTime":"2020-02-14 08:00:00Z","Class":"Scheduler","Method":"\u003cScriptBlock\u003e","Severity":"Normal","LogLevel":"Info","Message":"Timer reached on scheduled command [0728645bce3a4019bf4547815e5dcf22]","Data":{}}

The goodbye is received once and nothing then happens until I restart it the next day.

pauby commented 4 years ago

I switched PoshBot back on last night as I was going to try and troubleshoot the problem more. Like @donfouts I started getting 429 Too Many Requests from Slack. I see PoshBot successfully connecting to the API, then disconnecting, then connecting over and over again. My intention today was to reinstall PoshBot 0.12.0 just in case something has gone wrong. I also wanted to downgrade to 0.11.8 to see if that fixes it too. I haven't done either of those yet but will do shortly.

I don't want to pollute this issue with something unrelated so I mention it only as it has already been mentioned. Let me know if you want me to raise a separate issue for this. I also haven't posted logs for the same reason.

pauby commented 4 years ago

Just to update my above comment - I've now reinstalled 0.12.0 and I'm not getting the 429 response from Slack or the constant connecting and disconnecting that I experienced before. It could be that Slack had an issue?

devblackops commented 4 years ago

I can repro the CPU issue with this snippet that just connects to the websocket and outputs json to the output stream.

https://gist.github.com/devblackops/efef2a4e20e542d31a83b175a535ed17

Steps:

  1. Run script to create websocket connection
  2. Disconnect network adapter
  3. Observe pwsh.exe CPU usage
  4. Reconnect adapter
  5. Observe that websocket connection no longer works (and doesn't error out either).

This is not reproducing the goodbye event as that is sent by Slack, but this type of failure could happen with general network blips. Digging into it.

pauby commented 4 years ago

@devblackops I'm not sure this helps but I'll add it anyway.

I went back to 0.11.8 of the module and get the constant connect / disconnect I mentioned above. I know that 0.11.8 didn't have TLS 1.2 (I think?) turned on so I'm thinking maybe it's this that is causing the connect / disconnect? There is nothing in the logs that mentions it, in fact it says it connects to the Slack API:

{"DataTime":"2020-03-12 09:23:19Z","Class":"Bot","Method":"Start","Severity":"Normal","LogLevel":"Info","Message":"Beginning message processing loop","Data":{}}
{"DataTime":"2020-03-12 09:23:20Z","Class":"SlackConnection","Method":"ReadReceiveJob","Severity":"Warning","LogLevel":"Info","Message":"Websocket state is [Closed].\nState                  : Closed\r\nCloseStatus            : \r\nCloseStatusDescription :","Data":{}}
{"DataTime":"2020-03-12 09:23:20Z","Class":"SlackConnection","Method":"ReadReceiveJob","Severity":"Warning","LogLevel":"Info","Message":"Receive job state is [Completed]. Attempting to reconnect...","Data":{}}
{"DataTime":"2020-03-12 09:23:25Z","Class":"SlackConnection","Method":"Connect","Severity":"Normal","LogLevel":"Debug","Message":"Connecting to Slack Real Time API","Data":{}}
{"DataTime":"2020-03-12 09:23:25Z","Class":"SlackConnection","Method":"RtmConnect","Severity":"Normal","LogLevel":"Info","Message":"Successfully authenticated to Slack Real Time API","Data":{}}
{"DataTime":"2020-03-12 09:23:25Z","Class":"SlackConnection","Method":"StartReceiveJob","Severity":"Normal","LogLevel":"Info","Message":"Started websocket receive job [7]","Data":{}}
{"DataTime":"2020-03-12 09:23:29Z","Class":"SlackConnection","Method":"ReadReceiveJob","Severity":"Warning","LogLevel":"Info","Message":"Websocket state is [Closed].\nState                  : Closed\r\nCloseStatus            : \r\nCloseStatusDescription :","Data":{}}
{"DataTime":"2020-03-12 09:23:39Z","Class":"SlackConnection","Method":"ReadReceiveJob","Severity":"Warning","LogLevel":"Info","Message":"Receive job state is [Completed]. Attempting to reconnect...","Data":{}}
{"DataTime":"2020-03-12 09:23:44Z","Class":"SlackConnection","Method":"Connect","Severity":"Normal","LogLevel":"Debug","Message":"Connecting to Slack Real Time API","Data":{}}
{"DataTime":"2020-03-12 09:23:45Z","Class":"SlackConnection","Method":"RtmConnect","Severity":"Normal","LogLevel":"Info","Message":"Successfully authenticated to Slack Real Time API","Data":{}}
{"DataTime":"2020-03-12 09:23:46Z","Class":"SlackConnection","Method":"StartReceiveJob","Severity":"Normal","LogLevel":"Info","Message":"Started websocket receive job [9]","Data":{}}
{"DataTime":"2020-03-12 09:23:49Z","Class":"SlackConnection","Method":"ReadReceiveJob","Severity":"Warning","LogLevel":"Info","Message":"Websocket state is [Closed].\nState                  : Closed\r\nCloseStatus            : \r\nCloseStatusDescription :","Data":{}}
{"DataTime":"2020-03-12 09:23:50Z","Class":"SlackConnection","Method":"ReadReceiveJob","Severity":"Warning","LogLevel":"Info","Message":"Receive job state is [Completed]. Attempting to reconnect...","Data":{}}

Hopefully this isn't a red herring and helps ... in some way 😄

Note that going back to 0.12.0 connects fine, we just have the high CPU issue.

devblackops commented 4 years ago

The goodbye event should be properly handled in https://github.com/poshbotio/PoshBot/commit/224b80d5d9cf222d5136139e951008659467f83b now. If encountered, it will close/reopen the Slack RTM connection.

If people can build from src, try that out, and let me know if it helps, I'd appreciate it!

pauby commented 4 years ago

@devblackops I have deployed this now. Will let you know if it ends up with high CPU tomorrow.

pauby commented 4 years ago

I've got an update but I'm not quite sure if it's good or bad.

On Saturday (4 April) the server that PoshBot runs on started reporting high CPU again. I picked this up today and it appears it's a PowerShell PoshBot job. I had to kill it in Task Manager to get the server back to being responsive again so i didn't any troubleshooting done. When I did kill it I got this message:

WARNING: {"DataTime":"2020-04-06
08:15:48Z","Class":"SlackConnection","Method":"ReadReceiveJob","Severity":"Warning","LogLevel":"Info","Message":"Receiv
e job state is [Failed]. Attempting to reconnect...","Data":{}}

So I know it's a PoshBot job.

There is absolutely nothing in the logs that I can see so I'm a little confused.

I'm happy to leave this as is and keep an eye on it again and this time get the time to troubleshoot it. It's just sods law this happened over a weekend.

pauby commented 4 years ago

I have to report this again today. The server reported high CPU at 12:07 (BST). I've looked at the server now and Task Manager is showing the job is this:

image

And the user tab is showing this:

image

(the user in Orange is the same user that is running PoshBot).

If I try to use PoshBot in Slack I get no response.

When I kill the process in Task Manager I get this in the PoshBot console.

image

Here is what appears in the logs:

{"DataTime":"2020-04-10 15:27:45Z","Class":"SlackConnection","Method":"ReadReceiveJob","Severity":"Warning","LogLevel":"Info","Message":"Receive job state is [Failed]. Attempting to reconnect...","Data":{}}
{"DataTime":"2020-04-10 15:27:56Z","Class":"SlackConnection","Method":"Connect","Severity":"Normal","LogLevel":"Debug","Message":"Connecting to Slack Real Time API","Data":{}}
{"DataTime":"2020-04-10 15:27:56Z","Class":"SlackConnection","Method":"RtmConnect","Severity":"Normal","LogLevel":"Info","Message":"Successfully authenticated to Slack Real Time API","Data":{}}
{"DataTime":"2020-04-10 15:27:56Z","Class":"SlackConnection","Method":"StartReceiveJob","Severity":"Normal","LogLevel":"Info","Message":"Started websocket receive job [41]","Data":{}}
{"DataTime":"2020-04-10 15:27:57Z","Class":"SlackBackend","Method":"ReceiveMessage","Severity":"Normal","LogLevel":"Debug","Message":"Received message","Data":"{\"type\": \"hello\"}"}
{"DataTime":"2020-04-10 15:27:57Z","Class":"SlackBackend","Method":"ReceiveMessage","Severity":"Normal","LogLevel":"Debug","Message":"Message type is [hello]. Ignoring","Data":{}}

And process in Task Manager appears again:

image

If I try to use PoshBot now it works fine.

This is exactly the behaviour that was experienced before the fix for Slack being disconnected. The difference now appears to be that nothing is being logged to indicate the problem.

@devblackops Any thoughts?

devblackops commented 4 years ago

@pauby I'm seeing this in my environment as well now so it seems like new behavior Slack is rolling out where the 'goodbye' event is being sent every 24 hours or so. I've been working on changing the Slack backend to use a runspace pool to get better logging of what is going on with the connection, and making improvements to the reconnect logic.

The good news is I see this in my environment now (before I was just basing it on what others have said). Hopefully I can get this fixed soon!

DunlapOverbelt commented 4 years ago

@devblackops I've also been seeing this problem with PoshBot running as a service on PSVersion 5.1.14393.3471 on Windows 2016 server. I enabled debug mode for the logs and did see a 'goodbye' message in the logs the last time it happened.

I've enabled the fix in 224b80d and will let you know how that goes.

I was wondering if PoshBot could try a Ping and Pong message with Slack every few minutes and if a reply does not come back from slack in a certain amount of time, do a disconnect/reconnect. This might cover the cases where a 'goodbye' message is not seen in the logs.

devblackops commented 4 years ago

This should hopefully be fixed in this latest commit: https://github.com/poshbotio/PoshBot/commit/9016bc90c9d2f10eb94b24bd8269732a5684556f.

Can everyone pull latest, rebuild, and try it out?

DunlapOverbelt commented 4 years ago

@devblackops I will try to pull latest and rebuild based on commit: 9016bc9 and let you know how it goes.

My bot did fail after having applied the changes in commit: 224b80d and seeing a "goodbye" message in the logs. However, my Poshbot service and processes were still running.

I've attached a log file (obfuscated details) if that helps. It shows messages when receiving the "goodbye" message and then shows there were no messages until after I killed the slackbackend powershell process. It then shows the bot reconnecting and supporting a !status command before I manually stopped the poshbot service.

Hopefully I can get the latest build done today and get you some feed back in a day or two. poshbot-log-failed-goodbye.txt

TomWoodling commented 4 years ago

I'm also experiencing this problem (or a variant of it) - certainly i see my bot disconnect from Slack at the same time as the receive job hangs and spikes the CPU, then if I kill that powershell process it will reconnect. This was daily until 14/4, but since then it is happening roughly every 100 seconds. I've built the latest commit and tried that, but I see exactly the same issue. The logs stop too, and only restart with a reconnect, but I see no [goodbye] messages in there - and wouldn't expect to receive one every 100 seconds anyway. I wondered if I could also be hitting rate limits, but I get no errors to suggest that.

What I can say is this behaviour is consistent whether I'm using 0.12.0, 0.12.1 or even the script you made for the TLS issue at https://gist.github.com/devblackops/28afca76db2e1b5ed4fbd3eedf7c94cc If I hack a ping function into that script I do get the following error returned after 100 seconds:

Exception calling "GetResult" with "0" argument(s): "The remote party closed the WebSocket connection without completing the close handshake."
At line:13 char:17
+ ...             $websocket.SendAsync($bytes, [Net.WebSockets.WebSocketMes ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotSpecified: (:) [], MethodInvocationException
    + FullyQualifiedErrorId : WebSocketException
devblackops commented 4 years ago

@TomWoodling This looks to be the same underlying issue but it's odd that you get disconnected every 100 seconds.

The error you see above is coming from the Send-Ping function that is running inside the PowerShell job. It will send a ping back using the websocket connection every 5 seconds. It looks like at ~100 seconds it fails because the websocket connection was closed by Slack (or something else between you and Slack). We need some better error handling to account for this.

Can you think of anything in your environment that would possibly be interfiewing with the connection?

TomWoodling commented 4 years ago

@devblackops I've been thinking about that and I might have the answer to the 100s part. My Powershell versions were 5.1.14393.3471 on server or 5.1.18362.628 on desktop - updated one to v7 and now it can hold the connection. 5 minutes and counting so far.

DunlapOverbelt commented 4 years ago

@devblackops I haven't built the new version yet, but I am now getting disconnected about as often as @TomWoodling reports. I was able to be connected for at least 8 hours before today. Nothing has changed in my config for at least two days when I implemented the fix to see goodbye messages.

DunlapOverbelt commented 4 years ago

@devblackops I've built against the latest commit and still have Poshbot failing after about 100 seconds. PSVersion 5.1.14393.3471 on Windows 2016 server. Nothing in the logs to indicate why. using resource monitor, I see the websocket connection close.

TomWoodling commented 4 years ago

@DunlapOverbelt I can confirm that running the bot in PSVersion 7.0.0 fixes that - as I've just been finding out it runs side-by-side with PS v5.1 so it's not a huge deal to install it. Then all you need to do is start the bot normally , but using pwsh.exe instead (or directly in the PS v7 shell). It also stops the 100sec disconnects for the 0.12.0 PoshBot release from PSGallery.

DunlapOverbelt commented 4 years ago

@TomWoodling yea.. I need to get this up and running ASAP.. so I'm going to go ahead and install powershell 7 on my server that runs poshbot. There are a few things that appear to break with powershell7 though if using modules I've already downloaded.

I'd still like to figure out why it just stopped working for Powershell 5.1.. it doesn't make sense that it worked for hours before and then just stopped working for more than a few seconds days later.

@devblackops If I'm going to run with powershell 7, do I need to rebuild from source using powershell 7 to build Poshbot?

devblackops commented 4 years ago

@DunlapOverbelt Nope. You don’t need to rebuild to run in pwsh 7 if you’ve already done that.

I’ll do some testing with PowerShell 5.1. 6 and then 7 have been my daily drivers for awhile.

DunlapOverbelt commented 4 years ago

@devblackops @TomWoodling I've got Poshbot Running under powershell 7, however.. two of my main plugins won't work now.. I use Posh-SSH and OperationsManager modules in my plugins. Since they no longer work, I'll either need to find replacements/workarounds or figure out why Poshbot stopped working for Powershell 5.1. I'm also working on ActiveDirectory module for a middleware hook, but that should still work. I guess I'll start a clean version of the bot on a test system/test slack to poke around and try to debug.

devblackops commented 4 years ago

OK, I see the same behavior in PowerShell 5.1. After exactly 100 seconds, the PowerShell process for the websocket job is a steady 6% CPU and has stopped working. Digging into it.

PSVersion 5.1.19041.1 PSEdition Desktop PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...} BuildVersion 10.0.19041.1 CLRVersion 4.0.30319.42000 WSManStackVersion 3.0 PSRemotingProtocolVersion 2.3 SerializationVersion 1.1.0.1

devblackops commented 4 years ago

@pauby @DunlapOverbelt @TomWoodling @tenwiseman @wozzo

Everyone, I'm hoping this is finally squashed in https://github.com/poshbotio/PoshBot/commit/468cfa9205f017a1d7c4168bf8a072fb4fb37a02. Please rebuild from source and post back here if the issues goes away.

This post pointed me in the right direction: https://stackoverflow.com/questions/40502921/net-websockets-forcibly-closed-despite-keep-alive-and-activity-on-the-connectio. It appears to be an issue in .NET Framework (not core).

TomWoodling commented 4 years ago

@devblackops wow - that is quite the obscure issue indeed, great find! The new commit is working well for me under PS 5.1 - 15 minutes and no drops so far on a test bot, so it's already a massive improvement. That being said, I'll stick to PS 7 for production from now on 😹

On that note, I've managed to log a few of the goodbye messages on a bot running the previous commit and each time it's handled the reconnect gracefully, no manual restarts in nearly 3 days of running now.

DunlapOverbelt commented 4 years ago

@devblackops I've built PoshBot from 468cfa9 and have confirmed that Poshbot stays connected for longer than 100 seconds on PSVersion 5.1.14393.3471 on Windows 2016 server. I'll leave it running and let you know if I see a disconnect over night.

At least I don't need to re-write my plugins that depend on PS 5.1 modules with the WindowsCompatibility module yet! And Dilbert works again!

DunlapOverbelt commented 4 years ago

@devblackops confirming, things look good with PoshBot from commit 468cfa9 build. Logs also show reconnected after a goodbye message. Good to go!

I did notice during the build that requirements.psd1 lists PSSlack 1.0.2 but the PoshBot.psd1 lists PSSlack 1.0.1 as required modules. not sure if that need updated before this commit roles to 0.12.1 version.

Thanks for figuring this out!

devblackops commented 4 years ago

Awesome news. Thanks for testing!

I'll fix the version for PSSlack.

pauby commented 4 years ago

@devblackops I've actually just seen this so I've deployed 468cfa9 now.

devblackops commented 4 years ago

Excellent @pauby. Please let me know if it resolves your issue.

devblackops commented 4 years ago

Closing this out as 0.13.0 has been released with the fixes.