rikai / Showbot

🤖 An omnipresent multi-platform bot who's goal in life is become Skynet 🤖
MIT License
40 stars 18 forks source link

Track down unknown memory leak #61

Open rikai opened 7 years ago

rikai commented 7 years ago

There's currently a memory leak of unknown origin affecting the development branch that needs to be tracked down. This is a blocker for the next release.

We can use this issue to track progress on that front.

This should be a useful reference to help with tracking down the source.

ObserverHerb commented 7 years ago

How did you determine it is a memory leak? Does the program just crash after a period of time? Was it measured somehow? Just asking to give me an idea where to start looking.

rikai commented 7 years ago

Mostly by starting to have system issues due to memory exhaustion and seeing that showbot_irc.rb was using boatloads of memory (until top/ps stopped working).

The issue builds over several days of real-world usage and then the system kills the process once it starts interfering with system processes.

It's difficult to measure since we don't know the trigger, but there's some useful info for debugging ruby memory leaks here. :+1:

I'm almost positive the leak was introduced in a change made by @cbojar to one of the plugins, so looking there is probably a good place to start during his most active period before @eriknelson joined on and after my huge splurge of updates as it was completely stable until around that time and plugins were (mostly) the only thing touched that might cause this sort of issue, so looking at those commits is a good place to start.

You may also want to poke @eriknelson about the status #64. That'll let you run a fully silent version of JBot in the JB channel to debug it under real-world load which should make it way more apparent.

We've also been considering making a 'bot abuse script' that hammers the bot in various ways to force these issues to happen. We haven't actually written any code to this end yet, though it might be a good thing to do anyway just for future testing.

ghost commented 7 years ago

Things I've tried to find this leak.

htop

I had htop running on every test. I found that 8 threads were running all the time and 2 threads kept starting and stopping, one stays active for 40secs or so the other only a few seconds. But from time to time there would be more then 10 threads.

/proc/#{PID}/status

I started doing these dumps on long runs with qBird running to put JBot under load. Only after these long runs going over a show, sometimes 10-15 hours. I saw the leak. In one 5 hour period I saw a 60M+ memory increase in vmPeak.

rbtrace

At this point I tried everything to get this gem working, to no avail. I think its because it needs to be run in the main thread to work. So require 'rbtrace' might have to go somewhere in Cinch?

stackperf

rikai found this Gem and it works fine, It didn't really help much as it only shows minimal GC output.

ObjectSpace

Then I found ObjectSpace which has some new features in 2.1+ for doing full GC dumps. These dumps are quite large 35-45M for JBot and they need to be parsed. So I wrote qRubyDump to do so.

I added a new function to DevTools:

ddef export_dump_all
    m_time = Time.now
    m_hour = m_time.hour
    m_min = m_time.min
    m_sec = m_time.sec

    GC.start
    file = File.open("/home/lee/tmp/DevTools/dump_all_#{m_hour}:#{m_min}:#{m_sec}.json", 'w')
    ObjectSpace.dump_all(output: file)
    file.close
 end

Then set a timer for every hour to fire the method off.

qRubyDump

qRubyDump : Very simple tool to gather many ruby object dumps and process them for leaks.

Conclusion

After all this I still don't know where the leak is but I did find that it looks like a HASH object is growing over time and we do know its size. Where that leak is I don't know.