poshbotio / PoshBot

Powershell-based bot framework
MIT License
536 stars 108 forks source link

Connectivity problems when running poshbot in aws ecs fargate task showing receive job state is blocked #217

Closed sheldonhull closed 4 years ago

sheldonhull commented 4 years ago

Current Behavior

I've worked on deploying poshbot in AWS Fargate to provide a solid serverless solution to running this reliably.

After initial pain points, I've gotten poshbot up and working, however, the bot seems to be failing to receive when the websocket is open.

Any tips on figuring that out? I have full egress allowed, but no ingress as I was assuming this would be a websocket connection,

The cloudwatch logs show the following pattern occurring in a loop.

cloudwatch logs ``` { "DataTime": "2020-07-02 19:01:55Z", "Class": "SlackConnection", "Method": "Connect", "Severity": "Normal", "LogLevel": "Debug", "Message": "Connecting to Slack Real Time API", "Data": null } { "DataTime": "2020-07-02 19:01:56Z", "Class": "SlackConnection", "Method": "StartReceiveJob", "Severity": "Normal", "LogLevel": "Info", "Message": "Started websocket receive job [7575]", "Data": null } { "DataTime": "2020-07-02 19:02:05Z", "Class": "SlackConnection", "Method": "ReadReceiveJob", "Severity": "Warning", "LogLevel": "Info", "Message": "Receive job state is [Blocked]. Attempting to reconnect...", "Data": null } { "DataTime": "2020-07-02 19:02:05Z", "Class": "SlackConnection", "Method": "Disconnect", "Severity": "Normal", "LogLevel": "Info", "Message": "Closing websocket", "Data": null } { "DataTime": "2020-07-02 19:02:05Z", "Class": "SlackConnection", "Method": "Disconnect", "Severity": "Normal", "LogLevel": "Info", "Message": "Stopping receive job [7575]", "Data": null } ```

Context

AWS ECS Fargate hosted container

Your Environment

I believe the initial connection is fully successful as it seems to populate the list of users, however, something is causing problems with the websocket connection as shown above. This has continually cycled non-stop since launch.

devblackops commented 4 years ago

@sheldonhull Instead of trying to troubleshoot the websocket connection through PoshBot, can you try to repro using this gist? https://gist.github.com/devblackops/efef2a4e20e542d31a83b175a535ed17

That is a pretty minimal Slack websocket connection test. Just use your Slack token at the bottom. As you type in Slack, the script will output the JSON received. Can you test that on a VM in AWS to see if you get similar behavior to Fargate?

Off the top of my head, I can't think of what would be blocking the websocket connection.

sheldonhull commented 4 years ago

This is very useful. TLS failed. I ran the command to set again. Finally got

@{ok=False; error=not_allowed_token_type}

I tried token for bot + oauth. I need to go look at missing permission possibly. It's worked before I thought using the slack app + additional permissions. Let me check now.

error ```text Connect-SlackRtm : Error connecting to Slack Real Time API At line:1 char:14 + ... loginData = Connect-SlackRtm -Token "redacted ... + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : NotSpecified: (:) [Write-Error], WriteErrorException + FullyQualifiedErrorId : Microsoft.PowerShell.Commands.WriteErrorException,Connect-SlackRtm @{ok=False; error=not_allowed_token_type} At line:19 char:13 + throw $r + ~~~~~~~~ + CategoryInfo : OperationStopped: (@{ok=False; err...wed_token_type}:PSObject) [], RuntimeException + FullyQualifiedErrorId : @{ok=False; error=not_allowed_token_type} ```

I just reinstalled the app and still get the same result. The bot user has all permissions in the apps section that I believe it needs.

Interestingly enough I don't see any RTM in the bot token scopes right now.

sheldonhull commented 4 years ago

Solved.

The RTM permissions are not apparently being solved in the current application permissions. I've used this in the past I thought. I removed the App install from my workspace and created using the slack url per directions and this one successfully worked.

I'm not clear on why this happened as I thought I've had the bot work in the past with a normal app, but for now this solves my issue. Would still be good to perhaps to have clarity on why the "legacy" bot user approach works while the recommended Slack App approach fails.

sheldonhull commented 4 years ago

@devblackops so I ensured I added TLS1.2 at the top of the dockentrypoint.ps1 and also in the dockerfile. I no longer have Blocked. I am still getting this error message, and the instance is responding to my pings, and showing all setup steps seem correct.

{
  "DataTime": "2020-07-03 05:01:22Z",
  "Class": "SlackConnection",
  "Method": "ReadReceiveJob",
  "Severity": "Warning",
  "LogLevel": "Info",
  "Message": "Receive job state is [Failed]. Attempting to reconnect...",
  "Data": null
}

It's on a public subnet with all egress allowed. Any other ideas now that I've got the authentication working and all, just this failure with web sockets?

sheldonhull commented 4 years ago

Closing as i'm moving to teams and unable to spend more time in reproducing at this time.