znre commented 2 weeks ago

19098#issuecomment-2066929575

Summary

I upgraded to the latest version of metasploit (6.4.2 and 6.4.3) and in the past few days, trying to run exploit modules (almost any type of module) fills up the ram of my Kali Linux VM and crashes with the error:

zsh: killed     msfconsole -q

The simplest example I can recreate this with is TryHackMe's room called Blue:

I can confirm that this doesn't happen with an older version of Metasploit (6.3.43), with the same Kali Linux VM config:

kriptn commented 1 week ago

I have also been facing this issue. The targeted OS was ubuntu 22.04, sometimes the module works fine in the vm. Most of the time the exploit just get stuck, the fans ramp up and i eventually get the zsh killed message.

Anyone able to get around this issue ?

Z6543 commented 1 week ago

I had the same issue in two different environments. Besides the memory leak, CPU is also affected.

Affected environment: Ubuntu 22.04.4 x64 and v6.4.4 Metasploit Kali ARM latest with v 6.4.2 Metasploit

Also strange, but running msfconsole without sudo helped me, there was no memory leak in that case.

egilas commented 1 week ago

Have the same problem here. Both with and without sudo. Did a msfdb reinit, no luck. Ruby process eating all CPU and mem available. Version: 6.4.4+20240417170723~1rapid7-1 from apt. Picture below. Shortly after, got killed: zsh: killed msfconsole

adfoster-r7 commented 1 week ago

I've been unable to replicate; Are folk running with or without a database? Exact replication steps would be appreciated, or if anyone's able to track down the commit that has caused the issue that would also be helpful

znre commented 1 week ago

I've been unable to replicate; Are folk running with or without a database? Exact replication steps would be appreciated, or if anyone's able to track down the commit that has caused the issue that would also be helpful

It's the same regardless of whether or not I'm running it with a database.

As for replication steps, here's what I did:

Tried going through this TryHackMe room (an OVA can be downloaded here as well)

Metasploit steps:

$ msfconsole -q
$ use exploit/windows/smb/ms17_010_eternalblue
msf6 > use exploit/windows/smb/ms17_010_eternalblue
[*] No payload configured, defaulting to windows/x64/meterpreter/reverse_tcp
msf6 exploit(windows/smb/ms17_010_eternalblue) > set RHOSTS <IP of Blue Machine>
RHOSTS => <IP of Blue Machine>
msf6 exploit(windows/smb/ms17_010_eternalblue) > exploit

It's weird, I can't seem to replicate this issue when I use a fresh Kali VM and fully updating it.

But I can easily replicate it with my slightly modified Kali VM. Not sure if it's my tmux config, xfce4-terminal, docker, or something else. Here are my modifications to my Kali VM, in case anyone has the time to check.

I'll try testing more and see if I can find out the root cause.

Ashifcoder commented 1 week ago

Having this issue recently after upgrading to version 6.4.3 or 6.4.2. Previous version (6.4.0) worked fine.

Conditions when this issue arises:

Mainly while interacting with session commands like: session -i , bg , fg
The issue persist with or without sudo.
It takes around 2-3 min to get killed ( Virtual machine totally freezes and full utilization of CPU and RAM)
Metaspolit tested version: 6.4.3-dev

metaspolit

Testing: TryHackMe Room Blue

abezemskij commented 1 week ago

@ctf-box:/tmp$ msfconsole -v Framework Version: 6.4.2-dev

I started noticing this issue for the past week. The problem somewhat relates to interacting with sessions, channels and background activities. I was unable to find a way to replicate it reliably, but I noticed:

Sudo/Non-Sudo doesn't have an impact
DB/Non-DB, I haven't noticed any difference
~~It seems to only happen with VPN connections. I was unable to replicate the issue with local machines.~~
If I have multiple dead/closed sessions ~ 6
I have 4-5 active sessions with two channels.
~/.msf4 content has an impact on this
debug command doesn't state any issues

I had a theory that this is somehow related to closed/dead sessions not being sanitized properly. With each meterpreter session, you get a thread, and if you have multiple sessions + multiple closed/dead sessions, the threads stack up. They are cached somewhere in ~/.msf4 because msfconsole loads a good chunk of memory on startup.

My current workaround is to rm —rf $HOME/.msf4/* (Make Backup!) content before startup and monitor memory. Once it reaches critical levels (depending on your RAM), close msfconsole, clear $HOME/.msf4, launch msfconsole and repeat the process.

Edit: Use the workaround here: https://github.com/rapid7/metasploit-framework/issues/19098#issuecomment-2066929575

znre commented 1 week ago

It seems to only happen with VPN connections. I was unable to replicate the issue with local machines.

I was able to replicate the issue with a local machine.

Z6543 commented 1 week ago

I can confirm that 6.4.0 works fine, 6.4.1 has the issue. For testing, msfconsole startup time is a simple measure, I don't need to test with any sessions.

As you can see, in my env, running msfconsole as user takes 13 sec, running with sudo or sudo -i takes 120 - 150 sec.

echo exit > res.txt

time msfconsole -r res.txt
…
msfconsole -r res.txt  13.39s user 0.92s system 96% cpu 14.884 total

sudo time msfconsole -r res.txt
…
150.46user 8.30system 2:42.62elapsed 97%CPU (0avgtext+0avgdata 4945592maxresident)k
992344inputs+64960outputs (23991major+1751341minor)pagefaults 0swaps

sudo -i time msfconsole -r res.txt
…
msfconsole -r res.txt  121.62s user 7.36s system 97% cpu 2:11.69 total

abezemskij commented 1 week ago

It seems to only happen with VPN connections. I was unable to replicate the issue with local machines.

I was able to replicate the issue with a local machine.

Tried it. I agree.

To reproduce, have Windows 10 VM which executes msfvenom windows/x64/meterpreter/reverse_tcp payloads. In msfconsole (running as normal user), repeat: create listener, execute payload, exit session

After closing msfconsole and launching it back:

After clearing ~/.msf4

Z6543 commented 1 week ago

Tested with v6.4.5-dev-211de574aa, and it works without issues ...

ugyen-chophel commented 1 week ago

My current workaround is to rm—rf ~/.msf4/* (Make Backup!) content before startup and monitor memory. Once it reaches critical levels (depending on your RAM), close msfconsole, clear ~/.msf4, launch msfconsole and repeat the process.

My local machine is Ubuntu 22.04 and metasploit-framework version is 6.4.4+20240417170723~1rapid7-1. Like @abezemskij mentioned, this works for few times and then it starts to drain both CPU and RAM.

abezemskij commented 1 week ago

Tested with v6.4.5-dev-211de574aa, and it works without issues ...

Tried with v6.4.5-dev, that's my case.

I also noticed, that sometimes on exit command, there are two different responses (maybe some potential hint), also it seems that at some point it start allocating huge amount of memory, on exit command. meterpreter > exit [*] Shutting down session: 10 [*] 192.168.206.155 - Meterpreter session 10 closed. Reason: Died [*] 192.168.206.155 - Meterpreter session 10 closed. Reason: User exit msf6 exploit(multi/handler) > run

donran commented 1 week ago

I seem to have encountered this issue on 6.4.3-dev shipped as latest on Kali. The mentioned idea about VPN playing a part might be true as it is also a the case for me. I do use the database as well.

donran commented 1 week ago

I seem to have encountered this issue on 6.4.3-dev shipped as latest on Kali. The mentioned idea about VPN playing a part might be true as it is also a the case for me. I do use the database as well.

Something I noticed today is that my history file is 18MB, and this install is basically from yesterday. There's 1.2M lines that's repeating the same x lines. Could possibly be related? Today my console started and hung until it had allocated 4.2GB memory.

Removed that file and it started again with only 300 MB. After this I connected to a machine through SSH. Upgraded the session. And once I interacted with it via sessions 2, the memory spiked to 2.4GB. Ran exit in the meterpreter, allocated another 400MB.

adfoster-r7 commented 1 week ago

@donran That's an interesting lead! We did recently make changes to the history management logic here: https://github.com/rapid7/metasploit-framework/pull/18933

Does applying this revert patch mitigate the issue for you? https://github.com/rapid7/metasploit-framework/pull/19113

abezemskij commented 1 week ago

I've collected some data on memory usage using top. After clearing ~/.msf4, I was repeating the following: create listener, execute meterpreter staged payload on Win10, once connected, close session using exit. It seems it starts acting weird after session 8, the 9th session is still fine (but loads longer), afterwards the memory allocation starts escalating.

The process seemed to be allocating 0-48 virtual memory for sessions 1 to 9. The 10th session allocated 300MB, 11th - 600, 12th - 1670, 13th - 3957.

I was looking into VIRT/RES memory usage at different states: Before Creating Listener (BL), Listener Created (LC), Victim VM Connected (C'ed), After exiting meterpreter (AE) and Message from meterpreter when says session Died or User Exit, or both at the same time. If anyone interested, here's a csv.

msfconsole_mem_usage.csv

donran commented 1 week ago

@donran That's an interesting lead! We did recently make changes to the history management logic here: #18933

Does applying this revert patch mitigate the issue for you? #19113

@adfoster-r7 I cloned the repo and checked out that branch. But I seem to be receiving the same problem. Once I connect to the upgraded session memory instantly spikes to ~2.4GB.

Does not happen with a clean database, so could be that my current database is a little "corrupt"?

adfoster-r7 commented 1 week ago

@donran Does 6.4.0 work for you, and 6.4.4 breaks for you? If so, you could run a git bisect to find out what's gone wrong 🕵️

It's a tool for running through a a series of commits for identifying when an issue was first introduced

Start the git bisect:

git bisect start

Check out the version that's working for you, and mark it as 'good':

git checkout 6.4.0
# verify it works
git bisect good

Then check out the latest code and mark it as bad:

git checkout 6.4.4
# Verify it's bad
git bisect bad

The bisect will then put you somewhere in the middle between those two releases, and you can incrementally work out which commit introduced the problem via a divide-and-conquer approach, i.e.

# Start
➜  metasploit-framework git:(master) ✗ git bisect start
status: waiting for both good and bad commits

# Specify the first known working commit, i.e. that might be 6.4.0:
➜  metasploit-framework git:(master) ✗ git checkout 6.4.0 
➜  metasploit-framework git:(6.4.0) git bisect good
status: waiting for bad commit, 1 good commit known

# Specify the earliest known broken code, probably 6.4.4
➜  metasploit-framework git:(6.4.0) git checkout 6.4.4
Previous HEAD position was b461f08ba3 Land #18980, improves basic shell help command
HEAD is now at 607fb09391 automatic module_metadata_base.json update
➜  metasploit-framework git:(6.4.4) git bisect bad

# Git itself will check out different commits, and after each commit - verify if the problem still persists or not - and then after verifying, run either `git bisect bad` or `git bisect good` for each step
Bisecting: 100 revisions left to test after this (roughly 7 steps)
[876398da315bc51f213bdaf815362dabc50c4c8d] automatic module_metadata_base.json update

After the 7 or so steps of saying git bisect good or git bisect bad, it'll give you a final commit that introduced the issue:

➜  metasploit-framework git:(53efed1606) git bisect good
76145c30913a6950d8e3acb35259df02dd94d42a is the first bad commit
commit 76145c30913a6950d8e3acb35259df02dd94d42a
Merge: 53efed1606 71538a871f
Author: example <example@example.com>
Date:   Wed Apr 10 07:38:35 2024 -0400

    Land etc etc

If something goes wrong, you run git bisect reset to stop the bisect'ing, and then do another git bisect start

Hopefully that sheds some light on the problem

Z6543 commented 1 week ago

And now I cannot reproduce the issue anymore in my env since I used the v6.4.5-dev-211de574aa version. Maybe it is related to gem dependencies?

donran commented 1 week ago

@adfoster-r7 Here's what bisect told me might be the problematic commit. Looks somewhat plausible as well considering there was some weirdness with my history file as well.

Something I did note when doing this is that I had to run msfconsole twice. (I used a resource script to automate ssh -> upgrade -> interact) First time always allocated 2GB+ memory. Even on 6.4.0. But on commits after the one in the picture it happened on all runs. I did a msfdb stop, copied my old ~/.msf4, and msfdb init between each bisect step.

egilas commented 1 week ago

I've used my kali instance for ~1 year perhaps, and I've not used metasploit that much:

adfoster-r7 commented 1 week ago

@donran Thanks! That points to back to the history manager PR that I mentioned might be the culprit here, but this revert PR should have worked for you https://github.com/rapid7/metasploit-framework/pull/19113 as it undoes that commit 🤔

donran commented 1 week ago

@adfoster-r7 I think I might've been tricked due to the thing I mentioned about having to run msfconsole twice because it always allocated 2.4GB on the first run. I noticed this while doing the bisects.

Tried it again now on that commit, and as expected, the second time it does not allocate that much memory.

abezemskij commented 1 week ago

I have done some additional experiments, and I think it is actually the problem with history manager. What I've done:

Point history file ln -s /dev/null history to /dev/null, the memory grows slower, but still increasing, I was able to pass 15 iterations of creating/connecting/exiting session
When I've used debug -c it showed that I've ran >344056 commands: ... 344048 run 344049 exit 344050 sessions -i 1 344051 use multi/handler 344052 set payload windows/x64/meterpreter/reverse_tcp 344053 set lhost 192.168.206.137 344054 set lport 344055 run 344056 set lport 443 344057 run ...

~~It seems like it is copying a whole set of previous commands and appending a new one, where it stored somewhere in the memory.~~ ~~But it seems ln -s /dev/null history is much better workaround than my previous one rm -rf ~/.msf4~~

I was wrong regarding commands stored in the memory. It was meterpreter_history file. So it seems meterpreter_history and history files have direct impact on the issue.

~~A potential workaround for those having this issue is the following: rm -rf $HOME/.msf4/* && ln -s /dev/null $HOME/.msf4/meterpreter_history && ln -s /dev/null $HOME/.msf4/history~~

Latest workaround: https://github.com/rapid7/metasploit-framework/issues/19098#issuecomment-2066929575

adfoster-r7 commented 1 week ago

This will be fixed in Metasploit version 6.4.5 (PR)

The workaround for now until you can update to Metasploit 6.4.5 is to:

Temporarily disable the history mechanism by modifying your local history_manager.rb file, if you're on Kali you can find it here: /usr/share/metasploit-framework/lib/rex/ui/text/shell/history_manager.rb

diff --git a/lib/rex/ui/text/shell/history_manager.rb b/lib/rex/ui/text/shell/history_manager.rb
index 3d53a8c3c0..ec23aa4ac1 100644
--- a/lib/rex/ui/text/shell/history_manager.rb
+++ b/lib/rex/ui/text/shell/history_manager.rb
@@ -27,6 +27,7 @@ class HistoryManager
   # @param [Proc] block
   # @return [nil]
   def with_context(history_file: nil, name: nil, &block)
+    block.call; return nil
     push_context(history_file: history_file, name: name)

     begin

adfoster-r7 commented 1 week ago

@donran Does that workaround work for yourself?

ugyen-chophel commented 1 week ago

This will be fixed in Metasploit version 6.4.5

The workaround for now until you can update to Metasploit 6.4.5 is to:

Temporarily disable the history mechanism by modifying your local history_manager.rb file, if you're on Kali you can find it here: /usr/share/metasploit-framework/lib/rex/ui/text/shell/history_manager.rb
diff --git a/lib/rex/ui/text/shell/history_manager.rb b/lib/rex/ui/text/shell/history_manager.rb
index 3d53a8c3c0..ec23aa4ac1 100644
--- a/lib/rex/ui/text/shell/history_manager.rb
+++ b/lib/rex/ui/text/shell/history_manager.rb
@@ -27,6 +27,7 @@ class HistoryManager
   # @param [Proc] block
   # @return [nil]
   def with_context(history_file: nil, name: nil, &block)
+    block.call; return nil
     push_context(history_file: history_file, name: name)

     begin

@adfoster-r7 It works for me. Tested on version 6.4.3-dev

egilas commented 1 week ago

@adfoster-r7 - this fix worked for me as well. Running deb from metasploit.com repo:

6.4.4+20240417170723~1rapid7-1 amd64

Thank you so much!

Ashifcoder commented 1 week ago

The fix worked! @adfoster-r7

Tested with

Metaspolit 6.4.3-dev in Kali Linux 2024.1 latest update ✨
44 meterpreter shells! 😅

msf

znre commented 1 week ago

I can also confirm that the workaround works. Thank you very much @adfoster-r7 😄

adfoster-r7 commented 1 week ago

Metasploit-framework 6.4.5 is released to Kali, as well the official installers - so updating to the latest version should work as a solution. Will close this off :+1:

rapid7 / metasploit-framework

The latest Metasploit version seems to have some sort of memory leak? #19098

Workaround available here: https://github.com/rapid7/metasploit-framework/issues/19098#issuecomment-2066929575

Summary