Open grudzien opened 6 years ago
This sounds very similar to some issues I've experienced although I haven't looked very far into them. I have SlackRubyBot running in AWS as well and every so often the websocket will disconnect but the bot will stay running. So far, I've just solved it by restarting the bot, but I'd really like to get to the bottom of this. There was also one time where the websocket appeared to stay connected (according to Slack) but the bot wasn't responding to requests and when I checked the docker container, it said it was uusing 100% CPU (although maybe this was a one-off).
Likely related, https://github.com/slack-ruby/slack-ruby-client/issues/208
I guess I should clarify my post. I stated the web socket is not reconnecting. What I meant was the bot is NOT disconnecting. I had thought it was a disconnect/reconnect issue but that does not appear to be happening. Its just a linear memory leak. I am still going through #208 to see if there are similarities.
edit I have been tracking the source port number for the last day and a half and it hasn't changed.
Oh so you have a bot that's online just fine that's leaking memory? That's not good :) I would find a way to dump the difference and see what objects are leaking (could be something in your code too).
I think https://stackoverflow.com/questions/20385767/finding-the-cause-of-a-memory-leak-in-ruby has pretty good information overall. I would aggressively GC.collect
somewhere in the code/library and start dumping what's allocated to see a pattern.
I have searched through the issues of both slack-ruby-bot and celluloid for issues of a memory leak and I haven't seen anything. I initially discovered an issue running a Slack Bot in AWS where my bot would eventually leak enough memory to be OOM killed by Ubuntu 16.04. We tried moving the bot from 256M to 512M to 1026M to 2048M and no matter how much we gave it, the bot would eventually consume all memory of the box. To simplify the issue I took the standard Ubuntu 16.04 image from AWS, patched it and installed ruby and the proper gems and ran the ping bot. In the last 24 hours it has gone from 54M of ram to 102M of ram. Here are the traits I have noticed:
I am trying to avoid radical troubleshooting like jemalloc and recompiling ruby with more debugging. If anyone has any suggestions or has experience with this I would appreciate the help. I am about one or two more days from ditching the project.
My current install (I have tried three different versions of ruby) OS: Ubuntu 16.04 4.4.0-1061-aws #70-Ubuntu Ruby: ruby 2.3.1p112 (2016-04-26) [x86_64-linux-gnu] Gems: activesupport (5.2.0) aws-eventstream (1.0.1) aws-partitions (1.94.0) aws-sdk-core (3.22.0) aws-sdk-dynamodb (1.8.0) aws-sigv4 (1.0.2) bigdecimal (1.2.8) binary_struct (2.1.0) bundler (1.11.2) celluloid (0.17.3) celluloid-essentials (0.20.5) celluloid-extras (0.20.5) celluloid-fsm (0.20.5) celluloid-io (0.17.3) celluloid-pool (0.20.5) celluloid-supervision (0.20.6) concurrent-ruby (1.0.5) contracts (0.16.0) did_you_mean (1.0.0) dry-configurable (0.7.0) dry-container (0.6.0) dry-core (0.4.7) dry-equalizer (0.2.1) dry-inflector (0.1.2) dry-logic (0.4.2) dry-types (0.13.2) dry-validation (0.12.0) faraday (0.15.2) faraday_middleware (0.12.2) gli (2.17.1) hashie (3.5.7) heapy (0.1.3) hitimes (1.3.0) httpclient (2.8.3) i18n (1.0.1) io-console (0.4.5) jmespath (1.4.0) json (2.1.0, 1.8.3) minitest (5.11.3, 5.8.4) molinillo (0.4.3) multipart-post (2.0.0) net-http-persistent (2.9.4) net-telnet (0.1.1) nio4r (2.3.1) power_assert (0.2.7) psych (2.0.17) rake (10.5.0) rdoc (4.2.1) slack-ruby-bot (0.10.5) slack-ruby-client (0.11.1) sysrandom (1.0.5) test-unit (3.1.7) thor (0.20.0, 0.19.1) thread_safe (0.3.6) timers (4.1.2) tss (0.5.0) tzinfo (1.2.5) websocket-driver (0.7.0) websocket-extensions (0.1.3) ztimer (0.6.0)